Spherical gaussien approximation for Blinn-Phong, Phong and Fresnel
June 3, 2012 1 Comment
Spherical Gaussian approximation for lighting calculation is not new to the graphic community[4][5][6][7][8] but start only to be adopted by game developer community [1][2][3].
Spherical Gaussian (SG) are a type of spherical radial basis function (SRBF) which was introduced in [8] and can be used to approximate spherical lobe with Gaussian like function.
This post will describe how SG can be use to approximate Blinn-Phong lighting, Phong lighting and Fresnel.
Why care about SG approximation ?
In the context of realtime rendering for game, SG approximation allow to save few instructions when performing lighting calculation. Obviously for modern GPU graphic card, saving few ALU in a shader is a non sense, but for low hardware like the one found in the PS3, every GPU cycle count. This is less true for XBOX360 GPU which perform better with arithmetic instruction. It can also be use to schedule instruction in a different (low loaded) pipe on SPU for PS3 to increase performance [2].
Part of the work presented here credits to Matthew Jones (from Criterion Games) [9].
Different spherical radial basis function
The post talk about SG, but it is good to know that there is several different type of SRBF in graphic paper.
For completeness, I will talk quickly about SG and von Mises-Fisher (vMF) as this can be confusing.
SG definition can be found in [4]:
where p ∈ 𝕊2 is the lobe axis, λ ∈ (0,+∞) is the lobe sharpness,
and μ ∈ ℝ is the lobe amplitude (μ ∈ ℝ3 for RGB color).
vMF is detailed in [8]:
with inverse width k and central direction μ. vMFs are normalized to integrate
to 1. Note, I take notation of the paper which don’t match the notation above.
The paper give an approximation of this formula for k > 2 (which is almost always the case) :
As you can see the formulation are equal for a lobe sharpness > 2, the difference lie only in the normalization constant for vMF that we will omit. So in the following I will only refer to SG function.
Approximate Blinn-Phong with SG
An approximation of the Blinn-Phong model with SG is given in supplemental material of [4]:
Here is comparison of the accuracy of the approximation for low specular power (<10), medium (25) and high( > 50)



As can be seen, the approximation is accurate for medium and high specular. From my test, above 15 is accurate.
For really low specular power ( < 8) however, the result is a little poor but it still acceptable for a game.
Let’s analysis the performance. Translate to code the above expression result to
float dotHN = saturate(dot(H,N)); pow(dotHN, K) = exp(-K*(1-dotHN))
A power function is generally implemented on GPU as
pow(a, b) = exp(log(a) * b)
with exp the base-e exponential and log the base-e logarithm.
exp/log instruction are often not present in GPU instructions set and are replaced by base-2 exponential/base-2 logarithm
exp(a) = exp2(a/log(2)) log(a) = log2(a*log(2))
So
pow(a,b) = exp2(log(2)/log(2) * log2(a)*b) = exp2(log2(a)*b)
which is 3 ALU instructions: LOG2, MUL, EXP2
Come back to our SG approximation
pow(dotHN, K) = exp(-K*(1-dotHN) = exp2(-(K/log(2))(1-dotHN))
Refactor to be more GPU friendly
float A=K/log(2) pow(dotHN, K)= exp2(A * dotHN - A)
which is 3 ALU instructions : MUL, MAD, EXP2
Actually we have no gain on instruction count. The goal is to be able to process the K/log(2) offline (on the CPU) or in a previous calcul to save the MUL instruction.
A good entry point to factor the constant multiplication is at decompression time of the glossiness sampled in a gloss map. For sample in our game we store glossiness in a texture as [0..1] and decompress it to a range of [2..2048] (see Adopting a physically based shading model)
SpecularPower = exp2(10 * gloss + 1)
Which is a MAD and a EXP2. With SG approximation we will have
ModifiedSpecularPower = 1/log(2) * exp2(10 * gloss + 1) = exp2(10 * gloss + 1 + log2(1/log(2)))
By precomputing the constant (the GPU compiler will do it anyway), we still have a MAD and a EXP2
#define Log2Of1OnLn2_Plus1 1.528766 ModifiedSpecularPower = exp2(10 * gloss + Log2Of1OnLn2_Plus1); SpecularLighting = exp2(ModifiedSpecularPower * dotHN - ModifiedSpecularPower);
If you use a normalization term for energy conservation (see Adopting a physically based shading model), it need to take in count the ModifiedSpecularPower
(SpecularPower + 2)/8 = SpecularPower * 0.125 + 0.25 = ModifiedSpecularPower * Log(2) * 0.125 + 0.25 = ModifiedSpecularPower * 0.08664 + 0.25
Which still a single MAD. A complete code could be
#define LN2DIV8 0.08664
#define Log2Of1OnLn2_Plus1 1.528766
float SphericalGaussianApprox(float CosX, float ModifiedSpecularPower)
{
return exp2(ModifiedSpecularPower* CosX - ModifiedSpecularPower);
}
ModifiedSpecularPower = exp2(10 * gloss + Log2Of1OnLn2_Plus1);
SpecularLighting = (LN2DIV8 * ModifiedSpecularPower + 0.25) * SphericalGaussianApprox(DotNH, ModifiedSpecularPower);
So compare to standard power implementation we effectively save 1 instruction.
Better approximation
SG approximation is rather poor for low specular power. Matthew Jones [9] offers to add a constant of one in the equation for better accuracy base on the standard Fisher concentration factor given in the literature for vMF function.
But he advertise that optimal solution lie between 0 and 1.
I do some Mathematica research to find a best fit for x.
Here is two graph showing best fit result for x on a small range and for a high range of SpecularPower on horizontal axis (Mathematica file is provided at the end of the post)
Graphs show us that the best fit solution converge with increasing SpecularPower to a value of x=0.75.
However, test show that for x=0 or x = 1, we already get accurate result for medium and high specular. So we can focus on best approximating x only for low SpecularPower range as this is this range which cause trouble.
Specular power of 1 and 2 perform badly (you can see figure in Mathematica file) and will not be taken in count in the process. For SG approximation, it is best to only approximate SpecularPower > 2.
Taking the mean of x solution for SpecularPower ranging from 3 to 15 we get x=0.775
So the following expression could be a good candidate for more accurate SG approximation
Let’s compare

The graph show that the new formulation with x=0.775 better match the curve of power but the accuracy is not distributed equally and x = 0 perform better for high peak. For SpecularPower 10 and above the difference start to be subtle.
From a performance point of view
pow(dotHN, K) = exp((K+0.775)*(dotHN-1)) = exp2((K+0.775)/Log(2)*(dotHN-1)
The problem here compare to previous formulation is the added 0.775/log(2). It will add a ADD instruction which can’t be hide in previous calcul (but it can be precomputed offline if no texture are used).
So we get ADD, MAD, EXP2 which is 3 instruction like our original power function. However, the LOG2 has been replaced by a ADD which benefit from better scheduling facility on PS3, so this still an improvement…
Added note:
- A exponential function is non-zero everywhere compare to a power function which can be 0
exp(1000 * 0 - 1000) != 0 pow(0, 1000) == 0
But in practice this is not a problem because you (should) multiply your specular BRDF by saturate(dot(N,L)) allowing to reach zero at the terminator.
- With a power function implemented as exp2(log2(a) * b), for a = 0 we get exp2(-inf * 0) = exp2(NaN) = NaN.
With a SG approximation we always have a valid value.
- Other BRDF can be represented with SG like Ward or Beckmann distribution [4]
Approximate Phong with SG
In previous section we approximation a cosine lobe for the Blinn-Phong shading model. But the approximation can be reuse for a Phong shading model.
On this model and with multiple light, we can get really significant improvement.
The Phong lighting model require the angle between R and V where R is the reflected light around normal or for more efficiency, R and L with R the reflected view vector around normal.
pow(dot(R,L) = exp2(ModifiedSpecularPower * dot(R,L) - ModifiedSpecularPower);
We refactor this code as
float4 R2 = float4(ModifiedSpecularPower * R, -ModifiedSpecularPower); SpecularLighting = exp2(dot(R2, float4(L, 1)));
Each added light will only be a DOT4 and EXP2 instruction.
float4 R2 = float4(ModifiedSpecularPower * R, -ModifiedSpecularPower);
for (int LightIdx = 0, LightIdx < NumLight; ++LightIdx)
{
SpecularLighting = exp2(dot( R2, float4(L[LightIdx], 1) ) );
(...)
}
Approximate Fresnel with SG
SG can be used to approximate Fresnel term.
float dotEH = saturate(dot(E, H)) FresnelTerm = SpecularColor + (1.0f - SpecularColor) * pow(1 - dotEH, 5);
Let’s focus on the second part
pow(1 - dotEH, 5)
Which is a SUB, LOG2, MUL, EXP2
We will apply SG approximation to this expression with the added constant of the first section because in this case we have a know SpecularPower value and this will not add any extra instruction.
More, we can choose constant x as the best fit solution for SpecularPower 5 which is x = 0.788 (see Mathematica file at end of the post for details of this result).
pow(1 - dotEH, 5) = exp2((5 + 0.788)/Log(2) * ((1-dotEH) - 1)) = exp2(-8.35 * dotEH)
Which is MUL, EXP2. We save 50% instructions.
Here is a comparison graph
It looks pretty good for 50% gain. However in practice, the subtle difference between curve are visible, particularly on small angle. Taking an angle of 0, with the original Fresnel we get 0 for Fresnel value, but for SG approximation we get a value of exp2(-8.35) = 0,003 which in the context of HDR light and linear space lighting can become a visible effect. It is possible to get a better approximation of Fresnel by using a polynomial of degree 2 instead of a simple constant for the value inside the exp2 expression.
pow(1 - dotEH, 5) = exp2(x * (dotEH * dotEH) + y * dotEH))
Fitting x and y in Mathematica (see file) we get
pow(1 - dotEH, 5) = exp2(-5.55473 * (dotEH * dotEH)- 6.98316 * dotEH)) = exp2((-5.55473 * EdotH - 6.98316) * EdotH)
We refactor the expression to Horner’s form to be GPU friendly in the last line resulting in 3 instruction MAD, MUL, EXP2. Still one instruction save compare to original.
Graph comparison
This solution seems accurate enough. At angle 0 we get exp2(-12.53789)^=0.000168 which is 17 time smaller than previous approximation. We will still see problem in extreme case, but the behavior still better.
The choice of an approximation or the other depends on your tolerance to error.
Conclusion
This post provides some application of SG approximation in the goal to save few ALU instructions. For modern GPU, this is useless, but for PS3 hardware, it is a tools in your pocket.
The Mathematica file SphericalGaussianApproximation_blog contain all the figure and result of this post if you want to play with it. Any feedback are welcome.
Reference
[1] Kaplanyan, “CryENGINE 3: Reaching the Speed of Light” http://advances.realtimerendering.com/s2010/index.html
[2] Coffin, “SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3″ http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3
[3] Barré-Brisebois, “Approximating Translucency Revisited – With “Simplified” Spherical Gaussian Exponentiation”, http://colinbarrebrisebois.com/2012/04/09/approximating-translucency-revisited-with-simplified-spherical-gaussian/
[4] Wang, Ren, Gong, Snyder, Guo, “All-Frequency Rendering with Dynamic, Spatially-Varying Reflectance” http://research.microsoft.com/en-us/um/people/johnsny/papers/sg.pdf and http://research.microsoft.com/en-us/um/people/jpwang/paper_stuffs/sg_supp.pdf
[5] Iwasaki, Furuya, Dobashi, Nishita, “Real-time Rendering of Dynamic Scenes under All-frequency Lighting using Integral Spherical Gaussian” http://www.wakayama-u.ac.jp/~iwasaki/project/ibl/eg2012.pdf
[6] de Rousiers, Bousseau, Subr, Holzschuch, Ramamoorthi, “Real-Time Rough Refraction” http://graphics.berkeley.edu/papers/DeRousiers-RRR-2011-02/DeRousiers-RRR-2011-02.pdf
[7] Han, Sun, Ramamoorthi, Grinspun, “Frequency Domain Normal Map Filtering” http://www.cs.columbia.edu/cg/normalmap/index.html
[8] Tsai, Shih, “All-frequency precomputed radiance transfer using spherical radial basis functions and clustered tensor approximation” http://www.yzugraphics.cse.yzu.edu.tw/research/papers/TOG_2006/Paper.pdf
[9] Personal communication with Matthew Jones of Criterion games






