Dive in SH buffer idea

Edit: I rewritte some part of this post to be more understandable. First version was 30 September 2011.

Deferred lighting/shading is common now day. Deferred lighting buffers store results of accumulate diffuse and specular lighting for a given view to be composite with material properties later. But is there other way to achieve the same goal ? Store lights information in buffer to light objects later in order to have more flexibility in the lighting process.
Obviously, this way has already been explored. An idea is to store lights information themselves in buffer [7], another is to store the lighting environment in a compact form to be decompressed later, as Steve Anichini and Jon Greenberg do with spherical harmonic buffer [1][2]. I have myself think about the idea of using a spherical harmonic buffer (SH Buffer) a long  time ago and finally decided to give it a try. As them, I will share my experience here in order to grow the discussion about this algorithm. Any feedbacks are welcome.

SH Lighting : Same approach different goal

When starting a new approach, the first step is to fix the context. From my understanding, in [1][2][3] the approach was to produce a low resolution buffer then upsampling it to composite with the scene.The low resolution buffer allow to minimize bandwidth used by SH buffers and the upsampling phase require a “smart” filter, like a bilateral filter or an ID buffer like in Inferred lighting [5]. The author seems to try to replace the classic deferred light approach with this SH deferred approach. My context is different. I am in a forward rendering context, one pass per light and have really heavy constraint on interpolator. Any extra lights require sending geometry again. The main purpose of the SH Buffer will be for fill light (secondary light), and so can tolerate more approximation.

SH buffer is an appealing approach :
– No need to have the normal (and I don’t have them)
– Can be offloaded on SPU/CPU on console
– Decoupled from geometry
– SH buffer can be composite in the main geometry pass and get access to all other material properties. Mean that complex BRDF are handled correctly (like the one I describe in my post Adopting a physically based shading model).

Requirement of SH buffers is frightening: Using quadratic SH (9 coefficient) is impractical in term of performance and memory, but linear SH (4 coefficient) give good result for simulating indirect light. So we used only 4 SH coefficients.
We need to store 4 coefficients for each channel R, G and B. This result to 3 * 4 float16 per pixel which mean 21Mo at 1280×720! And the composite in the main pass require to sample 3 float16 buffers and add several instructions inside the already heavy main shader.
Note that in my context, I don’t want to do smart upsampling of the SH buffer. First because of artifact introduced, second because I want to compose the SH buffers in the main pass to be able to apply complex BRDF  and the shader is heavy enough. I am ALU bound and will not suffer too much of the high-resolution compare to adding instructions.

Obviously we can’t afford this on console. I will describe the method I try in order to minimize these constraints in following section.


In this section we deal only with diffuse lighting.
Here is my reference scene with classic dynamic lights ( click on all images to see full-size):

On the left there 3 lights, green, red, blue. A yellowish light with large brightness in the middle and three other overlapping lights purple, green, blue on the right.
The stairs on left and right allow highlighting the problem of light directionality. On the left stair the green light affect the top of the step and the blue the side. On the right stair, the three lights affect the top of the step and the purple light the side.

Here is the step and details of my test.

The first step is to render lights in the SH buffer by accumulating them in linear SH. To be practical, we need to use RGBA8bit SH buffer instead of float16 buffer. This highlight one of the problem of SH coefficient: they are signed float and require a range higher than one to get HDR lighting.
To avoid precision issue and sign constraint, I draw every light affecting a pixel in a single draw call (so no additive blending). I generate several combinations of shaders for different number of lights and used the tiled deferred lighting approach (as describe in [8] [9] or [10]).  In deferred rendering context,  the tiled deferred approach is only a gain for many lights on the screen, if you have few lights it is not an optimization. However, in my case this is the only approach available as you need to have all the accumulated ligths in the shader. I limit myself to 8 lights for this testing. As you can see, all optimizations of deferred lighting/shading apply.

I used the Z buffer to recover world position in the SH buffer pass, apply attenuation and project each light in SH.

float4 SHEvalDirection(float3 Dir)
    float4 Result;
    Result.x = 0.282095;
    Result.y = -0.488603 * Dir.y;
    Result.z = 0.488603 * Dir.z;
    Result.w = -0.488603 * Dir.x;
    return Result;

void AddLightToSH(float4 InLightPosition, float4 InLightColor, float3 PixelPosition,
                  inout float4 SHLightr, inout float4 SHLightg, inout float4 SHLightb)
    // No shadow available for SH buffer lighting
    float3   WorldLightVectorUnormalized       = InLightPosition.xyz - PixelPosition;
    float    DistanceAttenuation               = Attenuation(WorldLightVectorUnormalized);
    float3   LightColorWithDistanceAttenuation = DistanceAttenuation * InLightColor;

    float4 SHLightResult                        = SHEvalDirection(normalize(WorldLightVectorUnormalized));

    SHLightr += SHLightResult * LightColorWithDistanceAttenuation.r;
    SHLightg += SHLightResult * LightColorWithDistanceAttenuation.g;
    SHLightb += SHLightResult * LightColorWithDistanceAttenuation.b;

cosine convolution of the SH method

SH is only an approximation of the lighting environment. The more coefficients you have, the more precision you get. With only 4 coefficients we lose a little intensity (having 9 coefficient will allow to be closest to the original intensity), but still get good result. SH coefficients for each channel are store in 3 RGBA8 buffer and require the use of multirendertarget (MRT). My way to compress them is as follows:

OutColor0 = saturate(sqrt(abs(SHLightr * 0.5))) * sign(SHLightr);
OutColor0 = OutColor0 * 0.5 + 0.5;

What I do is:
– Reduce the range of SH in linear lighting space by dividing by 2
– Take the absolute value (because float are signed) and compress to get more precision in the low number (like sRGB for a color)
– Reassign sign
– Standard encode of signed char to  unsigned char [-1..1] to [0..1]

Why I divide by 2 ? I decided to fix my accumulation limit to [-2..2]. This represents a little dynamic range of 4 for all accumulated lights.
The highest coefficient applies in SH projection is : 0.488603. Adding a light of brightness 1 will produce at maximum a float of 0.488603. So I can accumulate 4 light of brightness 1 or one of brightness 4 with the worst cases resulting in 1.95.
On my test, using 4 (so having a dynamic range of 8 ) resulted in small banding.

To decompress in the main pass:

float4 SHr = tex2D(DNESHBufferR, UV) * 2.0 - 1.0;
SHr = (SHr * SHr) * sign(SHr) * 2.0;

The decompression is not free. Decompressing 12 float this way is heavy due to the “sign”. A “sign” intrinsic cost around 3 instructions depends on platform.
Then once decompressed, just apply coefficient in the right order:

float4 ConstantCoefficient = float4 (0.282095 * PI, -0.488603 * (2.0 * PI/3.0), 0.488603 * (2.0 * PI/3.0), -0.488603 * (2.0 * PI/3.0));
float4 DiffuseCoefficent = float4 (1.0, WorldNormal.yzx) * ConstantCoefficient;
float3 Color= DiffuseColor * float3 (dot(DiffuseCoefficent, SHr), dot(DiffuseCoefficent, SHg), dot(DiffuseCoefficent, SHb));
Color= max(Color, 0.0); // Color can be negative


The result is good. There a little loss of intensity due to low number of coefficient. The most significant loss is for the middle light but the overall picture is OK, and if you can’t compare side by side with a reference image there no problem.
The problem with this method is the decompression step which adds a lot of instruction.

Added note:
You can do an optimization here by precomputing all the constant ConstantCoefficient during SH generation with SHLight coefficient.

As a side note, if you have only one light to project in SH and premultiply the coefficient you get :

float4(0.282095 * 0.282095 * PI, -0.488603 * -0.488603 * (2.0 * PI/3.0), 0.488603 * 0.488603 * (2.0 * PI/3.0), -0.488603 * -0.488603 * (2.0 * PI/3.0))
=> float4 (0.25, 0.5, 0.5, 0.5)

Yes, SH are rather simple math at the end. Some compare it to wrap lighting (like half lambert).

Dominant light + ambient method

SH compression is a problem, and we do not want to add another buffer to store some helper compression factor. It is possible to approximate SH lighting result by extracting a directional light and an ambient color from the SH coefficient. The method to do that is describe in [11] and [12].

// extract dominant light direction from linear SH terms
float3 DominantDir = (SHLightr.yzw * 0.3 + SHLightg.yzw * 0.59 + SHLightb.yzw * 0.11);
// Optimal linear direction
DominantDir            = normalize(float3(-DominantDir.zx, DominantDir.y));
float4 SHDominant      = SHEvalDirection(DominantDir);
float Denom            = dot(SHDominant, SHDominant);

// find the color of the dominant light
float3 DominantColor;
DominantColor.r = (dot(SHLightr, SHDominant)) / Denom;
DominantColor.g = (dot(SHLightg, SHDominant)) / Denom;
DominantColor.b = (dot(SHLightb, SHDominant)) / Denom;
DominantColor   = max(DominantColor, 0.0); // DominantColor can be negative after calculation, so max it.
// subtract dominant light from original lighting environment
SHLightr.x = SHLightr.x - SHDominant.x * DominantColor.r;
SHLightg.x = SHLightg.x - SHDominant.x * DominantColor.g;
SHLightb.x = SHLightb.x - SHDominant.x * DominantColor.b;

// with the remaining light, fit an ambient light
float SHAmbient = 0.282095;
Denom = (SHAmbient * SHAmbient);

// find the color of the ambient light
float3 AmbientColor;
AmbientColor.r = (SHLightr.x * SHAmbient) / Denom;
AmbientColor.g = (SHLightg.x * SHAmbient) / Denom;
AmbientColor.b = (SHLightb.x * SHAmbient) / Denom;

The dominant direction, the dominant color and the ambient color are then stored in 3 RGBA8. Dominant need to used the classic * 0.5 + 0.5 cause it is [-1..1]. The color can be store in any RGBM format.
But a good way to encode color in replacement of RGBM is to divided by 4 (or any range your game require) and encode in sRGB or similar, this produce less instruction. As an optimization, you can pack the DominantDir in the two alpha channel resulting in using only 2 buffers instead of 3.

OutColor0 = saturate(float4(DominantDir, 0.0) * 0.5 + 0.5);
OutColor1 = saturate(float4(sqrt(DominantColor / 4.0), 0.0));  
OutColor2 = saturate(float4(sqrt(AmbientColor  / 4.0), 0.0));

For the AmbientColor case, I tried another way to get it than the code above which I prefer because I found it to match better the reference. Here is the replacement I made:

// with the remaining light, fit an ambient light
float3 AmbientColor;
AmbientColor.r = SHLightr.x * 0.282095 * PI;
AmbientColor.g = SHLightg.x * 0.282095 * PI;
AmbientColor.b = SHLightb.x * 0.282095 * PI;

In the main pass, code is simple:

float3 DominantDir     = tex2D(DNESHBufferR, ScreenUV) * 2.0 - 1.0;
float3 DominantColor   = tex2D(DNESHBufferG, ScreenUV);
DominantColor          = DominantColor * DominantColor * 4.0;
float3 AmbientColor    = tex2D(DNESHBufferB, ScreenUV);
AmbientColor           = AmbientColor * AmbientColor * 4.0;
float3 SHColor         = saturate(dot(WorldNormal, DominantDir)) * DominantColor + AmbientColor;


The result is rather good but has some minor artifact. The boundary of green and yellowish light are more pronounced, the stair on the right lose its purple color on the side of the step in favor of white and the stair on the left is a little more bright and have less blue on the side of the step.
The ambient color helped to get the blue color on the side of the step of the left stair, but it was not sufficient for the right stair.

Dominant light + ambient method from PPS

In Appendix 5 of [11] Peter Pike Sloan provides a formula to get the dominant and ambient color directly from SH coefficient.
The document says that the lighting vectors are all turned into irradiance environment maps through convolving with the normalized clamped cosine kernel.

//convolving with the normalized clamped cosine kernel
SHLightr.x   *= PI;
SHLightg.x   *= PI;
SHLightb.x   *= PI;
SHLightr.yzw *= 2.0 * PI / 3.0;
SHLightg.yzw *= 2.0 * PI / 3.0;
SHLightb.yzw *= 2.0 * PI / 3.0;
// Not include the V0 term intentionaly
float3 DominantColor;
DominantColor.r = (867.0 / (316.0 * PI)) * (dot(SHLightr.yzw, SHDominant.yzw));
DominantColor.g = (867.0 / (316.0 * PI)) * (dot(SHLightg.yzw, SHDominant.yzw));
DominantColor.b = (867.0 / (316.0 * PI)) * (dot(SHLightb.yzw, SHDominant.yzw));
DominantColor   = max(DominantColor, 0.0); // DominantColor can be negative after calculation, so max it.
float3 AmbientColor;
AmbientColor.r = (SHLightr.x - DominantColor.r * (8.0 * sqrt(DNE_PI) / 17.0)) * (sqrt(DNE_PI) / 2.0);
AmbientColor.g = (SHLightg.x - DominantColor.g * (8.0 * sqrt(DNE_PI) / 17.0)) * (sqrt(DNE_PI) / 2.0);
AmbientColor.b = (SHLightb.x - DominantColor.b * (8.0 * sqrt(DNE_PI) / 17.0)) * (sqrt(DNE_PI) / 2.0);
OutColor0 = float4(DominantDir, 0.0);
OutColor1 = float4(DominantColor,, 0.0);
OutColor2 = float4(max(AmbientColor, 0.0), 0.0); // Ambient color can become negative...

Convolving with the normalized clamped cosine kernel mean scaling SH by band factor PI for DC term and 2PI/3 for linear term. In this case, I don’t know if this scale factor should be divide by PI or not  (Dividing by PI will allow to convert irradiance to radiance, but I am not sure of term definition for this case and hope to learn it one day), so I test the two cases.

Without dividing band factor by PI:

With dividing band factor by PI:
In either case the result is bad. By dividing by PI we get too low intensity, without dividing, too high. And you can see the ambient color on the side of the stairs which are wrong. I wondering if I am doing something wrong or if this is the result expected.

Dominant only method

Is the ambient color really needed ? I wanted to test this case. So I test again the Dominant + Ambient color method (not the PSS method) but without the ambient color. This requires only two RGBA8bit buffer.

The result is OK. We have almost the same result as with ambient color but we lose the blue on the step of the left stair.
The problem here is that we extract the dominant light from the light environment of a pixel. This dominant light can be in an opposite direction of the pixel normal resulting in a zero diffuse lighting (the N.L), whereas if the diffuse lighting was extract from SH coefficient, it will result in a color. Ideally, we would like to get the dominant light from the hemisphere environment light on the side of the normal at the pixel. As we don’t have normal (and won’t have it) we can’t do this…

Added note:
The method to get hemisphere environment light is similar to the approximation of lights use in God of war 3 [6] which use the vertex normal to discard occluded light.

Another method of packing light environment in a pixel could be to apply what god of war 3 do in [6] to each pixel, and save world position, direction and color of the average light. I can see two problem here. World position would be almost impossible to compress and god of war method require normal at vertex to remove back face lighting. straight SH seems a better solution.

Luminance dominant light method : Another method I tried which was totally unsuccessful. I tried to extract 2 dominant lights from the SH. As I can’t store two direction and two color, I decided to store luminance of SH and dominant color. The SH Luminance allow to recover the dominant direction with optimal linear, and for the other direction I just take the opposite. I then use SH Luminance to get light intensity in both direction and multiply by dominant color. This lose color for opposite directional light. I talk about this method just for completeness as it just looks similar to dominant only.

What about specular ?

Simple method
SH allow to approximate distant diffuse lighting. Doing SH at each pixel allow to have some local diffuse lighting. But what can be done for specular ? We know that we can’t approximate specular lighting with so few SH coefficient. The goal here is to try to get pleasant visual result for specular lighting. The simple method will be to get a light direction and color to use for specular lighting from light environment. In the methods above I describe how to extract dominant light and dominant color from SH. So we have all we need to process specular lighting in the main shader with our BRDF. The problem is that we have only one direction for specular highligh, even if we approximate multiple lights. Obviously, there will be artifact. Intuitively the specular highlight will bend depend on influence of lights on the pixel.  Here is two result of this bending (pixels are affected by two lights):

The specular lighting is not visually pleasant. Some curved objects are OK, but most are not, and flat surface are particularly bad (like on screenshot).

A more complex method
To try to get less artefact, I try to pass SH coefficient inside the main shader and start my test from here. I extract the opposite side of the normal at pixel from the SH (like God of war 3 method I talked in added note above). I extract the dominant light and color from the remaining SH and apply lighting formula. Here is the full code:

float4 SHr = tex2D(DNESHBufferR, ScreenUV) * 2.0 - 1.0;
SHr = (SHr * SHr) * sign(SHr) * 2.0;
float4 SHg = tex2D(DNESHBufferG, ScreenUV) * 2.0 - 1.0;
SHg = (SHg * SHg) * sign(SHg) * 2.0;
float4 SHb = tex2D(DNESHBufferB, ScreenUV) * 2.0 - 1.0;
SHb = (SHb * SHb) * sign(SHb) * 2.0;

float4 ConstantCoefficient = float4( 0.282095 * PI,   -0.488603 * (2.0 * PI/3.0), 0.488603 * (2.0 * PI/3.0),-0.488603 * (2.0 * PI/3.0) );
float4DiffuseCoefficent = float4(1.0, WorldNormal.yzx) * ConstantCoefficient;
half3 IntermediateResult = DiffuseColor * float3(dot(DiffuseCoefficent, SHr), dot(DiffuseCoefficent, SHg), dot(DiffuseCoefficent, SHb));
IntermediateResult = max(IntermediateResult, 0.0);
// remove opposite side of normal
DiffuseCoefficent = half4(1.0, -WorldNormal.yzx) * ConstantCoefficient;
// Get Light color
half3 OppositeResult = half3(dot(DiffuseCoefficent, SHr), dot(DiffuseCoefficent, SHg), dot(DiffuseCoefficent, SHb));
OppositeResult = max(OppositeResult, 0.0);

SHr -= OppositeResult.r * half4(0.282095, -WorldNormal.y * -0.488603, -WorldNormal.z * 0.488603, -WorldNormal.x * -0.488603);
SHg -= OppositeResult.g * half4(0.282095, -WorldNormal.y * -0.488603, -WorldNormal.z * 0.488603, -WorldNormal.x * -0.488603);
SHb -= OppositeResult.b * half4(0.282095, -WorldNormal.y * -0.488603, -WorldNormal.z * 0.488603, -WorldNormal.x * -0.488603);

// Now get dominant light from remaining SH
float4 LuminanceSH   = (SHr * 0.3 + SHg * 0.59 + SHb * 0.11);
float3 DominantDir   = normalize(float3(-LuminanceSH.wy, LuminanceSH.z));

// find the color of the dominant light
float4 SHDominant    = half4(0.282095, DominantDir.y * -0.488603, DominantDir.z * 0.488603, DominantDir.x * -0.488603);
float Denom          = dot(SHDominant, SHDominant);

float3 DominantColor;
DominantColor.r = (dot(SHr, SHDominant)) / Denom;
DominantColor.g = (dot(SHg, SHDominant)) / Denom;
DominantColor.b = (dot(SHb, SHDominant)) / Denom;

float3 WorldHalfVector = normalize(WorldCameraVector + DominantDir);

float3 SpecularLighting = BinnPhong(saturate(dot(WorldNormal, WorldHalfVector)), Roughness) * dot(WorldNormal, DominantDir);
Color += IntermediateResult + SpecularLighting * DominantColor;

This solution helps a little for specular intensity artefact but still get bend specular highligh. More, it is too costly.


I hope people will found this post useful and discuss their own method. My own feeling is that handling of specular is too bad for the deferred SH buffer to be a good general solution. For diffuse lighting, this work nicely. But you should ask if you really need to use such method rather than a classic deferred engine. [1] and [2] discuss several other use for SH Buffer. The final remark is that if you have no overlapping light in your scene (mean if you put hundreds of small lights which affect no more than one pixel at a time), then you can get perfect result. And in this case, you can directly store direction and color in the SH buffer which is no more a SH buffer.

I am not sure that this technique can evolve in the future but by putting usage constraint, it can do its job.
I do some performance measurement on PS3. With Dominant light only method (so 2 RGBA8bit 1280×720) , depends on the scene my main shader can take an extra 0.8ms to 4 ms with 2ms average when applying SHBuffer with full physics based lighting model (this add Fresnel term).  Note that I am ALU Bound most of the time in my scene, replacing the 2 RGBA8bit buffer by a 1×1 texture give me a gain around 0.1ms. I found other method too costly in my case (memory or performance). You must add the cost of generating the SH Buffer which vary depends on number of lights and the method (tiled…). A full screen light cost around 1.2ms, 8 full screen lights cost around 7.5ms on RSX.
One of the benefit of requiring only Z-Buffer is that you can offload the process of light projection on CPU just after the Z-prepass while you render shadow, reflection etc… But memory requirement can be huge. For sample, if you target SPU Z-buffer 720p + 2 RGBA8 720p mean 10.5 Mo and memory transfer can add extra time.

As an improvement, it should be possible to generate and combine SSDO based on SH (See [4]) in SH Buffers at the same time as the lights are rendered. As lights injected in SH Buffer are often fill light, we correctly apply ambient occlusion on ambient lighting.


[1] Anichini, “Screen Space Spherical Harmonic Lighting” http://solid-angle.blogspot.com/2009/12/screen-space-spherical-harmonic.html
[2] Greenberg, “Has someone tried this before? ” http://deadvoxels.blogspot.com/2009/08/has-someone-tried-this-before.html
[3] http://www.gamedev.net/topic/571695-screen-space-sh-lighing-upsampling/
[4] O’Donnell, “Deferred Screen Space Directional Occlusion” http://kayru.org/articles/dssdo/
[5] Flavin, “Lighting the apocalypse” http://gdcvault.com/play/1014526/Lighting-the-Apocalypse-Rendering-Techniques
[6] Filipov, “Dynamic lighting in God of War 3″ http://advances.realtimerendering.com/s2011/Filipov%20-%20Dynamic%20Lights%20in%20GOW3%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx
[7]Trebilco, “Light Indexed Deferred Lighting” http://code.google.com/p/lightindexed-deferredrender/
[8] Tovey, “Parallelized Light Pre-Pass Rendering with the Cell Broadband Engine” http://www.spuify.co.uk/?p=645
[9] Balestra, Engstad, “The Technology of Uncharted: Drake’s Fortune” http://www.naughtydog.com/docs/Naughty-Dog-GDC08-UNCHARTED-Tech.pdf
[10] Coffin, “SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3″ http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3
[11] Sloan, “Stupid Spherical Harmonics (SH) Tricks” http://www.ppsloan.org/publications/StupidSH36.pdf
[12] Oat, Tatarchuk, Shopf “March of the froblins” (code is in the paper, but unable to find it)  http://developer.amd.com/documentation/presentations/legacy/Oat-Tatarchuk-Froblins-Siggraph2008.pdf

4 Responses to Dive in SH buffer idea

  1. fun4jimmy says:

    The link to the naughty dog paper is broken, it should be http://www.naughtydog.com/docs/Naughty-Dog-GDC08-UNCHARTED-Tech.pdf.

    Nice article.

  2. seblagarde says:

    Fixed. Thank you.

  3. tommak says:

    Have you tried extracting the color of the dominant light, its direction and intensity of the ambient light? Both colors should be similar.
    It would still require 2 RGBA8 buffer, just like the modified encoding scheme you propose (storing normal in the alpha channels) but might be a bit cheaper ALU-wise (no need to decode the normal)

  4. seblagarde says:

    No, I not test this Dominant + Ambient intensity method, but thank to talk about it.

    This is a nice solution to insert side by side with Dominant + Ambient color and it help with performance as you said. It will improve the quality over a Dominant only but still a little lower than using Dominant + Ambient color (think about two opposite lights with two differents color as in my test).

    Thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: