Spherical gaussien approximation for Blinn-Phong, Phong and Fresnel

Spherical Gaussian approximation for lighting calculation is not new to the graphic community[4][5][6][7][8] but start only to be adopted by game developer community [1][2][3].
Spherical Gaussian (SG) are a type of spherical radial basis function (SRBF) which was introduced in [8]  and can be used to approximate spherical lobe with Gaussian like function.
This post will describe how SG can be use to approximate Blinn-Phong lighting, Phong lighting and Fresnel.

Why care about SG approximation ?
In the context of realtime rendering for game, SG approximation allow to save few instructions when performing lighting calculation. Obviously for modern GPU graphic card, saving few ALU in a shader is a non sense, but for low hardware like the one found in the PS3, every GPU cycle count. This is less true for XBOX360 GPU which perform better with arithmetic instruction. It can also be use to schedule instruction in a different (low loaded) pipe on SPU for PS3 to increase performance [2].

Part of the work presented here credits to Matthew Jones (from Criterion Games) [9].

Different spherical radial basis function

The post talk about SG, but it is good to know that there is several different type of SRBF in graphic paper.
For completeness, I will talk quickly about SG and von Mises-Fisher (vMF) as this can be confusing.

SG definition can be found in [4]:
G(v; p,\lambda,\mu)=\mu e^{\lambda(v.p -1)} where p ∈ 𝕊2 is the lobe axis, λ ∈ (0,+∞) is the lobe sharpness,
and μ ∈ ℝ is the lobe amplitude (μ ∈ ℝ3 for RGB color).

vMF is detailed in [8]:
\gamma(n.\mu;\theta) = \frac{k}{4\pi \sinh(k)} e^{k(n.\mu)} with inverse width k and central direction μ. vMFs are normalized to integrate
to 1. Note, I take notation of the paper which don’t match the notation above.
The paper give an approximation of this formula for k > 2 (which is almost always the case) : \gamma(n.\mu;\theta)\approx \frac{k}{2\pi} e^{-k(1-n.\mu)}

As you can see the formulation are equal for a lobe sharpness > 2, the difference lie only in the normalization constant for vMF that we will omit. So in the following I will only refer to SG function.

Approximate Blinn-Phong with SG

An approximation of the Blinn-Phong model with SG is given in supplemental material of [4]:
D(h)=(h.n)^k \approx e^{-k(1-(h.n))}

Here is comparison of the accuracy of the approximation for low specular power (<10), medium (25) and high( > 50)

As can be seen, the approximation is accurate for medium and high specular. From my test, above 15 is accurate.
For really low specular power ( < 8) however, the result is a little poor but it still acceptable for a game.

Let’s analysis the performance. Translate to code the above expression result to

float dotHN = saturate(dot(H,N));
pow(dotHN, K) = exp(-K*(1-dotHN))

A power function is generally implemented on GPU as

pow(a, b) = exp(log(a) * b)

with exp the base-e exponential and log the base-e logarithm.
exp/log instruction are often not present in GPU instructions set and are replaced by base-2 exponential/base-2 logarithm

exp(a) = exp2(a/log(2))
log(a) = log2(a*log(2))

So

pow(a,b) = exp2(log(2)/log(2) * log2(a)*b) = exp2(log2(a)*b)

which is 3 ALU instructions: LOG2, MUL, EXP2

Come back to our SG approximation

pow(dotHN, K) = exp(-K*(1-dotHN) = exp2(-(K/log(2))(1-dotHN))

Refactor to be more GPU friendly

float A=K/log(2)
pow(dotHN, K)=  exp2(A * dotHN - A)

which is 3 ALU instructions  : MUL, MAD, EXP2
Actually we have no gain on instruction count. The goal is to be able to process the K/log(2) offline (on the CPU) or in a previous calcul to save the MUL instruction.

A good entry point to factor the constant multiplication is at decompression time of the glossiness sampled in a gloss map. For sample in our game we store glossiness in a texture as [0..1] and decompress it to a range of [2..2048] (see Adopting a physically based shading model)

SpecularPower = exp2(10 * gloss + 1)

Which is a MAD and a EXP2. With SG approximation we will have

ModifiedSpecularPower = 1/log(2) * exp2(10 * gloss + 1)
= exp2(10 * gloss + 1 + log2(1/log(2)))

By precomputing the constant (the GPU compiler will do it anyway), we still have a MAD and a EXP2

#define Log2Of1OnLn2_Plus1    1.528766
ModifiedSpecularPower = exp2(10 * gloss + Log2Of1OnLn2_Plus1);
SpecularLighting = exp2(ModifiedSpecularPower * dotHN - ModifiedSpecularPower);

If you use a normalization term for energy conservation (see Adopting a physically based shading model), it need to take in count the ModifiedSpecularPower

(SpecularPower + 2)/8 = SpecularPower * 0.125 + 0.25
= ModifiedSpecularPower * Log(2) * 0.125 + 0.25
= ModifiedSpecularPower  * 0.08664 + 0.25

Which still a single  MAD. A complete code could be

#define LN2DIV8               0.08664
#define Log2Of1OnLn2_Plus1    1.528766
float SphericalGaussianApprox(float CosX, float ModifiedSpecularPower)
{
    return exp2(ModifiedSpecularPower* CosX - ModifiedSpecularPower);
}
ModifiedSpecularPower = exp2(10 * gloss + Log2Of1OnLn2_Plus1);
SpecularLighting = (LN2DIV8 * ModifiedSpecularPower + 0.25) * SphericalGaussianApprox(DotNH, ModifiedSpecularPower);

So compare to standard power implementation we effectively save 1 instruction.

Better approximation

SG approximation is rather poor for low specular power. Matthew Jones [9]  offers to add a constant of one in the equation for better accuracy base on the standard Fisher concentration factor given in the literature for vMF function.
But he advertise that optimal solution lie between 0 and 1.

D(h)=(h.n)^k \approx e^{(k+x)((h.n)-1)}

I do some Mathematica research to find a best fit for x.
Here is two graph showing best fit result for x on a small range and for a high range of SpecularPower on horizontal axis (Mathematica file is provided at the end of the post)

Graphs show us that the best fit solution converge with increasing SpecularPower to a value of x=0.75.
However, test show that for x=0 or x = 1, we already get accurate result for medium and high specular. So we can focus on best approximating x only for low SpecularPower range as this is this range which cause trouble.
Specular power of 1 and 2 perform badly (you can see figure in Mathematica file) and will not be taken in count in the process. For SG approximation, it is best to only approximate SpecularPower > 2.

Taking the mean of x solution for SpecularPower ranging from 3 to 15 we get x=0.775

So the following expression could be a good candidate for more accurate SG approximation

D(h)=(h.n)^k \approx e^{(k+0.775)((h.n)-1)}

Let’s compare

The graph show that the new formulation with x=0.775 better match the curve of power but the accuracy is not distributed equally and x = 0 perform better for high peak. For SpecularPower 10 and above the difference start to be subtle.
From a performance point of view

pow(dotHN, K) = exp((K+0.775)*(dotHN-1)) = exp2((K+0.775)/Log(2)*(dotHN-1)

The problem here compare to previous formulation is the added 0.775/log(2). It will add a ADD instruction which can’t be hide in previous calcul (but it can be precomputed offline if no texture are used).
So we get ADD, MAD, EXP2 which is 3 instruction like our original power function. However, the LOG2 has been replaced by a ADD which benefit from better scheduling facility on PS3, so this still an improvement…

Added note:

- A exponential function is non-zero everywhere compare to a power function which can be 0

exp(1000 * 0 - 1000) != 0 
pow(0, 1000) == 0

But in practice this is not a problem because you (should) multiply your specular BRDF by saturate(dot(N,L)) allowing to reach zero at the terminator.

- With a power function implemented as exp2(log2(a) * b),  for a = 0 we get exp2(-inf * 0) = exp2(NaN) = NaN.
With a SG approximation we always have a valid value.

- Other BRDF can be represented with SG like Ward or Beckmann distribution [4]

Approximate Phong with SG

In previous section we approximation a cosine lobe for the Blinn-Phong shading model. But the approximation can be reuse for a Phong shading model.
On this model and with multiple light, we can get really significant improvement.

The Phong lighting model require the angle between R and V where R is the reflected light around normal or for more efficiency, R and L with R the reflected view vector around normal.

pow(dot(R,L) = exp2(ModifiedSpecularPower * dot(R,L) - ModifiedSpecularPower);

We refactor this code as

float4 R2 = float4(ModifiedSpecularPower * R, -ModifiedSpecularPower);
SpecularLighting = exp2(dot(R2, float4(L, 1)));

Each added light will only be a DOT4 and EXP2 instruction.

float4 R2 = float4(ModifiedSpecularPower * R, -ModifiedSpecularPower);
for (int LightIdx = 0, LightIdx < NumLight; ++LightIdx)
{
    SpecularLighting = exp2(dot( R2, float4(L[LightIdx], 1) ) );
    (...)
}

Approximate Fresnel with SG

SG  can be used to approximate Fresnel term.

float dotEH = saturate(dot(E, H))
FresnelTerm = SpecularColor + (1.0f - SpecularColor) * pow(1 - dotEH, 5);

Let’s focus on the second part

pow(1 - dotEH, 5)

Which is a SUB, LOG2, MUL, EXP2
We will apply SG approximation to this expression with the added constant of the first section because in this case we have a know SpecularPower value and this will not add any extra instruction.
More, we can choose constant x as the best fit solution for SpecularPower 5 which is x = 0.788 (see Mathematica file at end of the post for details of this result).

pow(1 - dotEH, 5) = exp2((5 + 0.788)/Log(2) * ((1-dotEH) - 1)) = exp2(-8.35 * dotEH)

Which is MUL, EXP2. We save 50% instructions.
Here is a comparison graph

It looks pretty good for 50% gain. However in practice, the subtle difference between curve are visible, particularly on small angle. Taking an angle of 0, with the original Fresnel we get 0 for Fresnel value, but for SG approximation we get a value of exp2(-8.35) = 0,003 which in the context of HDR light and linear space lighting can become a visible effect. It is possible to get a better approximation of Fresnel by using a polynomial of degree 2 instead of a simple constant for the value inside the exp2 expression.

pow(1 - dotEH, 5) = exp2(x * (dotEH * dotEH) + y * dotEH))

Fitting x and y in Mathematica (see file) we get

pow(1 - dotEH, 5) = exp2(-5.55473 * (dotEH * dotEH)- 6.98316 * dotEH))
= exp2((-5.55473 * EdotH - 6.98316) * EdotH)

We refactor the expression to Horner’s form to be GPU friendly in the last line resulting in 3 instruction MAD, MUL, EXP2. Still one instruction save compare to original.

Graph comparison

This solution seems accurate enough. At angle 0 we get exp2(-12.53789)^=0.000168 which is 17 time smaller than previous approximation. We will still see problem in extreme case, but the behavior still better.
The choice of an approximation or the other depends on your tolerance to error.

Conclusion

This post provides some application of SG approximation in the goal to save few ALU instructions. For modern GPU, this is useless, but for PS3 hardware, it is a tools in your pocket.
The Mathematica file SphericalGaussianApproximation_blog contain all the figure and result of this post if you want to play with it. Any feedback are welcome.

Reference

[1] Kaplanyan, “CryENGINE 3: Reaching the Speed of Light” http://advances.realtimerendering.com/s2010/index.html
[2] Coffin, “SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3″ http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3
[3] Barré-Brisebois, “Approximating Translucency Revisited – With “Simplified” Spherical Gaussian Exponentiation”, http://colinbarrebrisebois.com/2012/04/09/approximating-translucency-revisited-with-simplified-spherical-gaussian/
[4] Wang, Ren, Gong, Snyder, Guo, “All-Frequency Rendering with Dynamic, Spatially-Varying Reflectance” http://research.microsoft.com/en-us/um/people/johnsny/papers/sg.pdf and http://research.microsoft.com/en-us/um/people/jpwang/paper_stuffs/sg_supp.pdf
[5] Iwasaki, Furuya, Dobashi, Nishita, “Real-time Rendering of Dynamic Scenes under All-frequency Lighting using Integral Spherical Gaussian” http://www.wakayama-u.ac.jp/~iwasaki/project/ibl/eg2012.pdf
[6] de Rousiers, Bousseau, Subr, Holzschuch, Ramamoorthi, “Real-Time Rough Refraction” http://graphics.berkeley.edu/papers/DeRousiers-RRR-2011-02/DeRousiers-RRR-2011-02.pdf
[7] Han, Sun, Ramamoorthi, Grinspun, “Frequency Domain Normal Map Filtering” http://www.cs.columbia.edu/cg/normalmap/index.html
[8] Tsai, Shih, “All-frequency precomputed radiance transfer using spherical radial basis functions and clustered tensor approximation” http://www.yzugraphics.cse.yzu.edu.tw/research/papers/TOG_2006/Paper.pdf
[9] Personal communication with Matthew Jones of Criterion games

DONTNOD specular and glossiness chart

With permission of my company : Dontnod entertainmenhttp://www.dont-nod.com/

I get the permission from my company to release the graphic chart I made for our artists to help them with our new physically based rendering (PBR) workflow.
In the past I provided a blurred and reduced version of it in my blog post: feeding a physical based shading model.

Any feedback are welcome and if this chart is useful for you, tell us.

I generate the chart by programming with Paint.net at high-resolution (2048×2048). I strongly advice again resizing or compressing (jpeg, png) the image because the value may be incorrect after the filtering.

Caution : The full resolution version of the chart is a tga file in a .zip that I rename to “.pdf” as WordPress don’t support zip file. So just right-click on the image below, save the pdf file then change the extension to “.zip”, decompress and you get the tga file.

(Real world pictures courtesy of Andrea Weidlich from “Exploring the potential of layerer BRDF models” siggraph asia 2009)

The chart has been design for my team and with our convention for textures. The chart is use as predefined value that the artist color pick and use when creating textures.

The chart is divide in two parts.
The upper is for specular color of material, the second is used for glossiness (roughness on the image).

Specular color

We use a RGB DXT1 texture to store colored specular with sRGB encoding. All displayed value are sRGB in this part. To be clear: artist color pick values, create their texture, save texture as usual and at runtime the shader convert the sRGB value to linear RGB value.

To force artists to use right range of values for specular color, we define two gradient of color displayed on the left. One for dielectric material (no metallic) and one for metallic material.
The range of value for no metallic material goes from 40 to 75, which mean 0.017-0.067 in linear space which overlap the range 0.02-0.05 of common dielectric material.
The range of value for no metallic material goes from 155 to 255, which mean 0.33-1.0 in linear space which overlap the range 0.5-1 of common metallic material.
For both range we show with red line the value for common material. A “U” shape red line mean that the range of values inside the “U” can represent the material. The right values depends on its properties.

On the right we provided some common sample with exact values, or a range of values (indicate by the “<->”).
Note that even if your eyes don’t make the difference, there is different values in a range.

Glossiness

We use the alpha channel of the normal map DXT5 to store the grey level value of glossiness (roughness on the image). All values displayed are in linear RGB. To be clear: artist color pick values, create texture,  save texture as usual and at runtime the shader use the value directly.
The gradient display glossiness from 0 for rough (left) material to 1 for smooth material (right).
The grey gradient are from 0 to 255 and red segments are displayed  every 1/10 with a sphere below to show the in-game result of the designated value.
The first row of real world image above represent no metallic object, the second row represent metallic object. Goal is to give artist a better feeling of what is glossiness.

Note : The glossiness chart is strongly coupled with the glossiness range chose in your game engine. These values are for our game engine glossiness range of 2-2048.

Relationship between Phong and Blinn lighting model

version : 1.1 – Living blog – First version was 29 March 2012

Phong and Blinn are the two most used lighting models in game development. The properties of the two have been debate a lot of time and I won’t discuss this here. See [1] to know why you should use Blinn rather  than Phong when you do physically based rendering. The point of interest I would discuss is the relationship between Phong lobe shape and Blinn lobe shape. This relationship matter when you use image based lighting (IBL). In this case you generally have only one cubemap sample and can only emulate Phong shape highlight (see AMD Cubemapgen for physically based rendering). The problem come when you want to use IBL and try to match it with analytic Blinn lighting model use for direct lighting. The highlight shape don’t match. Here is a comparison between an analytic Phong (Left) and Blinn (Right) highlight with the same specular power (Click for high rez).

This post will study what we can do to better match Blinn and Phong highligh. There is really few paper about this subject [2] [3]. About Phong and Blinn relationship, it can be show that the Phong angle is twice the Blinn angle [3]:

cos^{-1}(R.E) = 2 cos^{-1}(N.H)

This relation allows to write

cos^\rho(\theta) = cos^{x \rho} (\frac{\theta}{2})

Where \rho is the specular power, \theta is the Phong angle ( cos^{-1}(R.E) ) and x is our unknown parameter. Finding the value of x which best fit this equation will provide our relationship between Phong and Blinn.

Yoshiharu Gotanda give 4.2 for the value of x in [2]. Frederick Fisher and Andrew Woo give 4 in [3]. Following sections will discuss these results.

Read more of this post

AMD Cubemapgen for physically based rendering

Version : 1.65 – Living blog – First version was 4 September 2011

AMD Cubemapgen is a usefull tool which allow cubemap filtering and mipchain generation. Sadly, AMD decide to stop the support of this tool. However it has been made open source  [1] and has been upload on Google code [2] to be improved by community. With some modification, this tool is really useful for physically based rendering because it allow to generate an irradiance environment map (IEM) or a prefiltered mipmaped radiance environment map (PMREM).  A PMREM is an environment map (in our case a cubemap) where each mipmap has been filtered by a cosine power lobe of decreasing cosine power value. This post describe such improvement I made for Cubemapgen.

Latest version of Cubemapgen (which include modification describe in this post) are available in the download section of the google code repository. Direct link : ModifiedCubeMapGen-1_65 (require VS2008 runtime and DX) .

This post will first describe the new features added to Cubemapgen, then for interested (and advanced) reader, I will talk about theory behind the modification and go into some implementation details.

The modified Cubemapgen

The current improvements are under the form of new options accessible in the interface:

(click for full rez)


- Use Multithread : Allow to use all hardware threads available on the computer. If uncheck, use the default behavior of Cubemapgen. However new feature are unsupported for the default behavior.
- Irradiance Cubemap : Allow a fast computation of an irradiance cubemap. When checked, no other filter or option are taken in count. An irradiance cubemap can be got without this option by setting a cosine filter with a Base angle filter of 180 which is a really slow process. Only the base cubemap is affected by this option, the following mipmap use a cosine filter with some default value but these mipmap should not be used.
- Cosine power filter : Allow to specify a cosine power lobe filter as current filter. It allow to filter the cubemap with a cosine power lobe. You must select this filter to generate a PMREM.
- Cosine power edit box : Allow to specify the cosine power to use with the cosine power filter.
- Power drop on mip : Use for generate a PMREM. The first mipmap will use the cosine power edit box value as cosine power for the cosine power lobe filter. Then the cosine power will be scale by power drop on mip to process the next mipmap and once again this new cosine power will be scale for the next mipmap until all mipmap are generated. For sample, settings 2048 as cosine power edit box and 0.25 as power drop on mip, you will generate a PMREM with each mipmap respectively filtered by cosine power lobe of 2048, 512, 128, 32, 8, 2…
- Lighting model: This option should be use only with cosine power filter. The choice of the lighting model depends on your game lighting equation. The goal is that the filtering better match your in game lighting.
- Warp edge fixup: New edge fixup method which do not used Width based on NVTT from Ignacio Castaño.
- Bent edge fixup: New edge fixup method which do not used Width based on TriAce CEDEC 2011 presentation.

All modification are available in command line (Print usage for detail with “Cubemapgen.exe – help”).

Irradiance cubemap
Here is a comparison between irradiance map generated with cosine filter of 180 and the option irradiance cubemap (Which use spherical harmonic(SH) for fast processing): Read more of this post

PI or not to PI in game lighting equation

Version : 3.1 – Living blog – First version was 4 January 2012

With physically based rendering current trend of photo-realistic game, I feel the need to do my lighting equation more physically correct. A good start for this is to try to understand how our currently game lighting equation behave. For a long time, the presence or not in my game lighting equation of term \pi or \frac{1}{\pi} have been more based on trial and error than physics. The origin of these terms in game lighting equations have already been discussed by others [1][3][7][12]. But as I found this subject confusing I dedicated this post only to that topic. This post is not about code, it is about understanding what we do under the hood and why this is correct. I care about this because more correct game lighting equations mean consistency under different lighting condition, so less artists time spend to tweak values. I would like to thank Stephen Hill for all the help he provide me around this topic.

This post is written as a memo for myself and as a help to anyone which was confusing as I was. Any feedback are welcomed.

I will begin this post by talking about Lambertian surface and the specificity of game light’s intensity then talk about diffuse shading and conclude by specular shading. I will not define common term found in lighting field like BRDF, Lambertian surface… See [1] for all these definitions and notation of this post.

Origin of \pi term confusion

The true origin of the confusing \pi term come from the Lambertian BRDF which is the most used BRDF  in computer graphics. The Lambertian BRDF is a constant define as :

f(l_c,v)=\frac{c_{diff}}{\pi}

The notation f(l_c,v) mean BRDF parametized by light vector l_c and view vector v. The view vector is not used in the case of Lambertian BRDF. c_{diff} is what we commonly call diffuse color.
The first confusing \frac{1}{\pi} term appear in this formula. It come from a constraint a BRDF should respect which is name conservation of energy. It mean that the outgoing energy cannot be greater than the outgoing energy. Or in other word that you can’t create light. The derivation of the \frac{1}{\pi} can be found in [3].

As you may note, game Lambertian BRDF  have not this \frac{1}{\pi} term. Let’s see a light affecting a Lambertian surface in game:

FinalColor = c_diff * c_light * dot(n, l)

To understand where the \frac{1}{\pi} disappeard, see how game light’s intensity is define. Games don’t use radiometric measure as the light’s intensity but use a more artist friendly measure  [1] :

For artist convenience, c_{light} does not correspond to a direct radiometric measure of the light’s intensity; it is specified as the color a white Lambertian surface would have
when illuminated by the light from a direction parallel to the surface normal (l_c=n)

Which mean, if  you setup a light in a game with color  c_{light} and point it directly on a diffuse only quad mapped with a white diffuse texture  you get the color of c_{light}.
Another way to see this definition is by taking the definition of a diffuse texture [2] :

How bright a surface is when lit by a 100% bright white light

Which mean, if  you setup a white light in a game with brightness 1 and point it directly on a diffuse only quad mapped with a diffuse texture, you get the color of the diffuse texture.
This is very convenient for artists which don’t need to care about physical meaning of light’s intensity unit.

Theses definitions allows to define the punctual light equation use in game. A punctual light is an infinite small light like directional, point or spot light common in games.

L_o(v)=\pi f(l_c,v)\bigotimes c_{light}\underline{(n\cdot l_c)}).

The derivation and the notation of this equation is given in [1]. L_o(v) is the resulting exit radiance in the direction of the view vector v which is what you will use as color for your screen pixel. v is not used for Lambertian BRDF.

Using the punctual light equation with a Lambertian BRDF give us :

L_o=\pi \frac{c_{diff}}{\pi} \bigotimes c_{light}\underline{(n\cdot l_c)}).

Which I will rewrite more simply by switching to a monochrome light (\bigotimes is for RGB) :

L_o=\frac{c_{diff}}{\pi} \pi c_{light} \underline{(n\cdot l_c)}

This shading equation looks familiar except the \pi term. In fact after simplification we get :

L_o=c_{diff} c_{light} \underline{(n\cdot l_c)}

Which is our common game diffuse lighting equation.

This mean that for artists convenience, the value they enter as brightness in light’s settings is in fact the result of the light brightness multiply by \frac{1}{\pi} (the energy conserving constant of Lambertian BRDF) . When artists put 1 in brightness, in reality they set a brightness of \pi. This is represented in the punctual lighting equation by \pi c_{light}. In this post, I will define as game light’s unit the fact of multiplying the light brightness by \pi and as game Lambert BRDF the c_{diff} term which is linked.

In the following I will describe the consequence of the game light’s unit on common diffuse lighting technic use in game then on common specular lighting technic.

Diffuse lighting in game

Lambert lighting

Read more of this post

Tips, tricks and guidelines for specular cubemap

Version : 1.1 – Living blog – First version was 2 December 2011

It is frequent in today’s game to use cubemap to simulate ambient specular lighting, and with physically based rendering (PBR) starting to be largely adopted by the game industry this will grow up (But this post is not related to PBR). There is a lot of different way of using specular cubemaps in game and there is a lot to care about. With this post, I would like to share several methods/tips when working with specular cubemaps. I use the term specular cubemap to refer both to classic specular cubemap and to prefiltered mipmapped radiance environment map. In the context of this post, there is no need for the distinction and this simplify the notation. I will not talk about diffuse lighting at all (which use an irradiance cubemap) so when I use the term cubemap alone, I mean specular cubemap. As always, any feedbacks or comments are welcomed.

Strategies for applying specular cubemaps

In this section I will discuss several strategies of cubemap usage for ambient specular lighting. The choice of the right method to use depends on the game context and engine architecture. For clarity, I need to introduce some definitions:

I will divide cubemaps in two categories:
- Infinite cubemaps: These cubemaps are used as a representation of infinite distant lighting, they have no location. They can be generated with the game engine or authored by hand. They are perfect for representing low frequency lighting scene like outdoor lighting (i.e the light is rather smooth across the level) .
- Local cubemaps: These cubemaps have a location and represent a finite environment lighting.They are mostly generate with game engine based on a sample location in the level. The generated lighting is only right at the location where the cubemap was generated, all other locations must be approximate. More, as cubemap represent an infinite box by definition, there is parallax issue (Reflected objects are not at the right position) which require tricks to be compensated. They are used for middle and high frequency lighting scene like indoor lighting. The number of local cubemap required to match lighting condition of a scene increase with the lighting  complexity (i.e if you have a lot of different lights affecting a scene, you need to sample the lighting at several location to be able to simulate the original lighting condition).

And as we often need to blend multiple cubemap,  I will define different cubemap blending method :
- Sampling K cubemaps in the main shader and do a weighted sum. Expensive.
- Blending cubemap on the CPU and use the resulted cubemap in the shader. Expensive depends on the resolution and required double buffering resources to avoid GPU stall.
- Blending cubemap on the GPU and use the resulted cubemap in the shader. Fast.
- Only with a deferred or light-prepass engine: Apply K cubemaps by weighted additive blending. Each cubemap bounding volume are rendered to the screen and normal+roughness from G-Buffer are used to sample the cubemap.

In all stategies describe below, I won’t talk about visual fidelity but rather about problems and advantages.

Object based cubemap
Each object is linked to a local cubemap. Objects take the nearest cubemap placed in the level and use it as specular ambient light source. This is the way adopted by Half Life 2 [1] for their world specular lighting.
Background objects will be linked at their nearest cubemaps offline and dynamic objects will do dynamic queries at runtime. Cubemaps can have a range to not affect objects outside their boundaries, they can affect background objects only, dynamic objects only or both.

The main problem with object based cubemap are lighting seams between adjacent objects using different cubemaps.
Here is a screenshot (click for full res)

On this screenshot you can see two cubemaps linked offline to several objects (the red lines). Lighting seams here occur at the door between the two rooms. You can see that walls are another problem, they should be divided in two parts, one for each room to allow to assign the right cubemap. Note that nearest cubemap is not always the right choice. Visibility should be test in some case.

Some advice for the cubemap placement with this method are describe here [2]:

If a cubemap is intended for NPCs or the player, the env_cubemap should be placed at head-height above the ground. This way, the cubemap will most accurately represent the world from the perspective of a standing creature.
If a cubemap is intended for static world geometry, the env_cubemap should be a fair distance away from all brush surfaces.
A different cubemap should be taken in each area of distinct of visual contrast. A hallway with bright yellow light will need its own env_cubemap, especially if it is next to a room with low blue light. Without two env_cubemap entities, reflections and specular highlights will seem incorrect on entities and world geometry in one of the areas.
(…)
Because surfaces must approximate their surroundings via cubemaps, using too many cubemaps in a small area can cause noticeable visual discontinuities when moving around. For areas of high reflectivity, it is generally more correct to place one cubemap in the center of the surface and no more. This avoids seams or popping as the view changes.

The other problem is that you need to track which cubemap affect dynamics object. As dynamic object can swap their cubemap, there is popping. Popping can be reduced by blending the K nearest cubemap of the dynamic object. This is rather an expensive solution, you can blend the K cubemaps in the shader, adding a lot of instruction and fetch. But even blending K nearest cubemap will not prevent the popping induce by switching the smallest contributing cubemap when there is more than K cubemaps present.

Zone based cubemap
An infinite cubemap is assigned for each zone of the level. When the camera enter in a zone, the infinite cubemap from the zone is applied on all objects. Killzone 2/3 used this method [3]. Prey 2 use this approach too (see comment of Brian Karis in this post and comment in [7]). Read more of this post

Dive in SH buffer idea

Edit: I rewritte some part of this post to be more understandable. First version was 30 September 2011.

Deferred lighting/shading is common now day. Deferred lighting buffers store results of accumulate diffuse and specular lighting for a given view to be composite with material properties later. But is there other way to achieve the same goal ? Store lights information in buffer to light objects later in order to have more flexibility in the lighting process.
Obviously, this way has already been explored. An idea is to store lights information themselves in buffer [7], another is to store the lighting environment in a compact form to be decompressed later, as Steve Anichini and Jon Greenberg do with spherical harmonic buffer [1][2]. I have myself think about the idea of using a spherical harmonic buffer (SH Buffer) a long  time ago and finally decided to give it a try. As them, I will share my experience here in order to grow the discussion about this algorithm. Any feedbacks are welcome.

SH Lighting : Same approach different goal

When starting a new approach, the first step is to fix the context. From my understanding, in [1][2][3] the approach was to produce a low resolution buffer then upsampling it to composite with the scene.The low resolution buffer allow to minimize bandwidth used by SH buffers and the upsampling phase require a “smart” filter, like a bilateral filter or an ID buffer like in Inferred lighting [5]. The author seems to try to replace the classic deferred light approach with this SH deferred approach. My context is different. I am in a forward rendering context, one pass per light and have really heavy constraint on interpolator. Any extra lights require sending geometry again. The main purpose of the SH Buffer will be for fill light (secondary light), and so can tolerate more approximation.

SH buffer is an appealing approach :
- No need to have the normal (and I don’t have them)
- Can be offloaded on SPU/CPU on console
- Decoupled from geometry
- SH buffer can be composite in the main geometry pass and get access to all other material properties. Mean that complex BRDF are handled correctly (like the one I describe in my post Adopting a physically based shading model).

Requirement of SH buffers is frightening: Using quadratic SH (9 coefficient) is impractical in term of performance and memory, but linear SH (4 coefficient) give good result for simulating indirect light. So we used only 4 SH coefficients.
We need to store 4 coefficients for each channel R, G and B. This result to 3 * 4 float16 per pixel which mean 21Mo at 1280×720! And the composite in the main pass require to sample 3 float16 buffers and add several instructions inside the already heavy main shader.
Note that in my context, I don’t want to do smart upsampling of the SH buffer. First because of artifact introduced, second because I want to compose the SH buffers in the main pass to be able to apply complex BRDF  and the shader is heavy enough. I am ALU bound and will not suffer too much of the high-resolution compare to adding instructions.

Obviously we can’t afford this on console. I will describe the method I try in order to minimize these constraints in following section.

Experiment

In this section we deal only with diffuse lighting.
Here is my reference scene with classic dynamic lights ( click on all images to see full-size):

On the left there 3 lights, green, red, blue. A yellowish light with large brightness in the middle and three other overlapping lights purple, green, blue on the right.
The stairs on left and right allow highlighting the problem of light directionality. On the left stair the green light affect the top of the step and the blue the side. On the right stair, the three lights affect the top of the step and the purple light the side.

Here is the step and details of my test.

The first step is to render lights in the SH buffer by accumulating them in linear SH. To be practical, we need to use RGBA8bit SH buffer instead of float16 buffer. This highlight one of the problem of SH coefficient: they are signed float and require a range higher than one to get HDR lighting.
To avoid precision issue and sign constraint, I draw every light affecting a pixel in a single draw call (so no additive blending). I generate several combinations of shaders for different number of lights and used the tiled deferred lighting approach (as describe in [8] [9] or [10]).  In deferred rendering context,  the tiled deferred approach is only a gain for many lights on the screen, if you have few lights it is not an optimization. However, in my case this is the only approach available as you need to have all the accumulated ligths in the shader. I limit myself to 8 lights for this testing. As you can see, all optimizations of deferred lighting/shading apply.

I used the Z buffer to recover world position in the SH buffer pass, apply attenuation and project each light in SH.

float4 SHEvalDirection(float3 Dir)
{
    float4 Result;
    Result.x = 0.282095;
    Result.y = -0.488603 * Dir.y;
    Result.z = 0.488603 * Dir.z;
    Result.w = -0.488603 * Dir.x;
    return Result;
}

void AddLightToSH(float4 InLightPosition, float4 InLightColor, float3 PixelPosition,
                  inout float4 SHLightr, inout float4 SHLightg, inout float4 SHLightb)
{
    // No shadow available for SH buffer lighting
    float3   WorldLightVectorUnormalized       = InLightPosition.xyz - PixelPosition;
    float    DistanceAttenuation               = Attenuation(WorldLightVectorUnormalized);
    float3   LightColorWithDistanceAttenuation = DistanceAttenuation * InLightColor;

    float4 SHLightResult                        = SHEvalDirection(normalize(WorldLightVectorUnormalized));

    SHLightr += SHLightResult * LightColorWithDistanceAttenuation.r;
    SHLightg += SHLightResult * LightColorWithDistanceAttenuation.g;
    SHLightb += SHLightResult * LightColorWithDistanceAttenuation.b;
}

cosine convolution of the SH method

SH is only an approximation of the lighting environment. The more coefficients you have, the more precision you get. With only 4 coefficients we lose a little intensity (having 9 coefficient will allow to be closest to the original intensity), but still get good result. SH coefficients for each channel are store in 3 RGBA8 buffer and require the use of multirendertarget (MRT). My way to compress them is as follows:

Read more of this post

Adopting a physically based shading model

Version : 1.31 – Living blog – First version was 2 August 2011

With permission of my company : Dontnod entertainmenhttp://www.dont-nod.com/

This last year sees a growing interest for physically based rendering. Physically based shading simplify parameters control for artists, allow more consistent look under different lighting condition and have better realistic look. As many game developers, I decided to introduce physical based shading model to my company. I started this blog to share what we learn. The blog post is divided in two-part.

I will first present the physical shading model we chose and what we add in our engine to support it : This is the subject of this post. Then I will describe the process of making good data to feed this lighting model: Feeding a physically based shading model . I hope you will enjoy it and will share your own way of working with physically based shading model. Feedback are welcomed!

Notation of this post can be found in siggraph 2010 Physically-Based Shading Models in Film and Game Production Naty Hoffman’s paper [2].

Working with a physically based shading model imply some changes in a game engine to fully support it. I will expose here the physically based rendering (PBR) way we chosed for our game engine.

When talking about PBR, we talk about BRDF, Fresnel, energy conserving, Microfacet theory, punctual light sources equation… All these concepts are very well described in [2] and will not be reexplained here.

Our main lighting model is composed of two-part: Ambient lighting and direct lighting. But before digging into these subjects, I will talk about some magic numbers.

Normalization factor

I would like to clarify the constant we find in various lighting model. The energy conservation constraint (the outgoing energy cannot be greater than the incoming energy) requires the BRDF to be normalized. There are two different approaches to normalize a BRDF.

Normalize the entire BRDF

Normalizing a BRDF means that the directional-hemispherical reflectance (the reflectance of a surface under direct illumination) must always be between 0 and 1 : R(l)=\int_\Omega f(l,v) \cos{\theta_o} \mathrm{d}\omega_o\leq 1 . This is an integral over the hemisphere. In game R(l) corresponds to the diffuse color c_{diff} .

For lambertian BRDF, f(l,v) is constant. It mean that R(l)=\pi f(l,v) and we can write f(l,v)=\frac{R(l)}{\pi}
As a result, the normalization factor of a lambertian BRDF is \frac{1}{\pi}

For original Phong (the Phong model most game programmer use) \underline{(r\cdot v)}^{\alpha_p}c_{spec} normalization factor  is \frac{\alpha_p+1}{2\pi}
For Phong BRDF (just mul Phong by \cos{\theta_i} See [1][8]) \underline{(r\cdot v)}^{\alpha_p}c_{spec}\underline{(n\cdot l)} normalization factor  becomes \frac{\alpha_p+2}{2\pi}
For Binn-Phong \underline{(n\cdot h)}^{\alpha_p}c_{spec} normalization factor  is \frac{(\alpha_p+2)}{4\pi(2-2^\frac{-\alpha_p}{2})}
For Binn-Phong BRDF \underline{(n\cdot h)}^{\alpha_p}c_{spec}\underline{(n\cdot l)} normalization factor  is \frac{(\alpha_p+2)(\alpha_p+4)}{8\pi(2^\frac{-\alpha_p}{2}+\alpha_p)}
Derivation of these constants can be found in [3] and [13]. Another good sum up is provide in [27].

Note that for Blinn-Phong BRDF, a cheap approximation is given in [1] as : \frac{\alpha_p+8}{8\pi}
There is a discussion about this constant in [4] and here is the interesting comment from Naty Hoffmann

About the approximation we chose, we were not trying to be strictly conservative (that is important for multi-bounce GI solutions to converge, but not for rasterization).
We were trying to choose a cheap approximation which is close to 1, and we thought it more important to be close for low specular powers.
Low specular powers have highlights that cover a lot of pixels and are unlikely to be saturating past 1.

When working with microfacet BRDFs, normalize only microfacet normal distribution function (NDF)

A Microfacet distribution requires that the (signed) projected area of the microsurface is the same as the projected area of the macrosurface for any direction v [6]. In the special case v = n:
\int_\theta D(m)(n\cdot m)\mathrm{d}\omega_m=1
The integral is over the sphere and cosine factor is not clamped.

For Phong distribution (or Blinn distribution, two name, same distribution) the NDF normalization constant is  \frac{\alpha_p+2}{2\pi}
Derivation can be found in [7]

Direct Lighting

Our direct lighting model is composed of two-parts : direct diffuse + direct specular
Direct diffuse is the usual Lambertian BRDF : \frac{c_{diff}}{\pi}
Direct specular is the microfacet BRDF describe by Naty Hoffman in [2] : F_{schilck}(c_{spec},l_c,h)\frac{\alpha_p+2}{8\pi}\underline{(n\cdot h)}^{\alpha_p}

Read more of this post

Feeding a physically based shading model

Version : 1.0 – Living blog – First version was 17 August 2011

With permission of my company : Dontnod entertainmenhttp://www.dont-nod.com/

Adopting a physically based shading model is just a first step. Physically based rendering (PBR) require to use physical lighting setup and good spatially varying BRDF inputs (a.k.a textures) to get best results.
Feeding the shading model with physically plausible data is in the hand of artists.

There are many texture creation tutorials available on the web. But too often, artists forget to link their work with the lighting model for which textures are created. With traditional lighting model, there is often a RGB diffuse texture, RGB specular texture, specular mask texture, constant specular power and normal map. For advanced material you can add specular power texture, Fresnel intensity texture, Fresnel scale texture, reflection mask texture…
Physically based shading model is more simple and will provide a consistent look under different lighting condition. However, artists must be trained because right values are not always trivial to find and they should accept to not fully control specular response.

Our physically based shading model requires four inputs:

  • Diffuse color RGB (named diffuse albedo or diffuse reflectance or directionnal-hemispherical reflectance)
  • Specular color RGB (named specular albedo or specular reflectance)
  • Normal and gloss monochrome

Authoring time of these textures are not equal. I will expose the advice and material reference to provide to artists to help them authoring these textures. The better the artists workflow will be, the better the shading model will appear. Normal and gloss are tightly coupled so they will be treated together.

When talking about texture, we talk about sRGB/RGB color space, linear/gamma space… All these concepts are well described in [2] and will not be explained here.

Before digging into the subject in more detail, here are some advices for the textures workflow :

  • Artists must calibrate their screens. Or better, all your team’s screen should be calibrated in the same way [6].
  •  Make sure Colour Management is set to use sRGB in Photoshop [5].
  •  Artists will trust their eyes, but eyes can be foolish. Adjusting grey level texture can be annoying [7]. Provide reference material and work with a neutral grey background.
  •  When working with sRGB color space, as it is the case for most textures authored with Photoshop, remember that the middle grey is not 128,128,128 but 187,187,187. See John Hable post [22] for comparison between 128 and 187 middle grey.
  • Game engine should implement debug view mode to display texture density, mipmap resolution, lighting only, diffuse only, specular only, gloss only, normal only… This is a valuable tool to track textures authoring problems.
  • Textures should be uniform in the scene. Even if all textures are amazing, only one poor texture on the screen will attract the eye, like a dead pixel on a screen. The resulting visual feeling will be bad. The same scene with uniform density and medium quality will look better.

Dielectric and metallic material

There are different types of substances in real world. They can be classified in three main group: Insulators, semi-conductors and conductors.
In game we are only interesting by two of them: Insulators (Dielectric materials) and conductors (Metallic materials).
Artists should understand to which category a material belong to. This will have influence on diffuse and specular value to assign to this material.

I already talked about these two categories in the post Adopting a Physically based shading model.

Dielectric materials are the most common materials. Their optical properties rarely vary much over the visible spectrum: water, glass, skin, wood, hair, leather, plastic, stone, concrete, ruby, diamond…
Metals. Their optical properties vary over the visible spectrum: iron, aluminium, copper, gold, cobalt,  nickel, silver…
See [8].

Diffuse color

Diffuse textures require some time to author.

In the past, it was usual to bake everything in a “diffuse” texture to fake lighting effects like shadow, reflection, specular… With newer engine, all these effects are simulated and must not be baked.
The best definition for diffuse color in our engine is : How bright a surface is when lit by a 100% bright white light [4]. This definition is related to the definition of light unit from the punctual light equation (See Adopting a physically based shading model). Read more of this post