Siggraph 2014 : Moving Frostbite to Physically based rendering V3

Here is the slides, course notes and Mathematica files of me and my-coworker Charles de Rousiers “Moving Frostbite to Physically based rendering” (The course notes have been update to v3, mathematica files to v3):

Course notes: course_notes_moving_frostbite_to_pbr_v3
Pdf Slides: s2014_pbs_frostbite_slides
PowerPoint Slides: s2014_pbs_frostbite_slides
Mathematica Notebooks: movingfrostbitetopbr-mathematicanotebook_v3
Mathematica Notebooks export as pdf to be readable without Mathematica: movingfrostbitetopbr-mathematicapdf_v3

Caution : Both Mathematica files are .zip that I rename to “.pdf” as WordPress don’t support zip file. So just right-click on the image below, save the pdf file then change the extension to “.zip”,

Slideshare version:

Alternatively the files are/was available at others location (Let here in case links are update):
And also on the official PBR course website: (To be update only slides for now)

The talk is a survey of current PBR technics and small improvement we have done for the Frostbite engine. It covert many topics. Here is the table of content of the course note (available on linked website):

1 Introduction
2 Reference
2.1 Validating models and hypothesis
2.2 Validating in-engine approximations
2.3 Validating in-engine reference mode
3 Material
3.1 Material models
3.2 Material system
3.3 PBR and decals
4 Lighting
4.1 General
4.2 Analytical light parameters
4.3 Light unit
4.4 Punctual lights
4.5 Photometric lights
4.6 Sun
4.7 Area lights
4.8 Emissive surfaces
4.9 Image based lights
4.10 Shadow and occlusion
4.11 Deferred / Forward rendering
5 Image
5.1 A Physically Based Camera
5.2 Manipulation of high values
5.3 Antialiasing
6 Transition to PBR

v2 Update:
During a year, we have get several feedbacks from various people on our document (Sorry we forget to do a list of all of them). There was several mistakes, typo and unclear statement. We have upgrade the course note with all the reported error and clarified some part. The v2 course contain the following list of correction (Also listed on page 98 in the new course note pdf document):

– Section 3.2.1 – Corrected wrong statement for describing the micro-specular occlusion of the Reflectance parameters: ”The lower part of this attribute defines a micro-specular occlusion term used for both dielectric and metal materials.”. Description of BaseColor and Reflectance parameters have been updated.
– Section 3.2.1 – Removed reference on Alex Fry work of normal encoding as it has not been done.
– Section 4.2 – Updated the description of color temperature for artificial lights sources. Including the concept of color correlated temperature (CCT).
– Section 4.4 – Clarified what is lightColor in Listing 4
– Section 4.5 – Clarified what is lightColor in Listing 5
– Section 4.6 – Updated and explained the computation of the Sun solid angle and the estimated illuminance at Earth surface.
– Section – Added comment in Listing 7: FormFactor equation include a invPi that needs to be canceled out (with Pi) in the sphere and disk area light evaluation
– Section – Clarified in which case the diffuse sphere area formula is exact above the horizon
– Section – Clarified in which case the diffuse disk area formula is exact above the horizon
– Section 4.7.4 – Correct listing 15. getDiffuseDominantDir parameter N is float3
– Section 4.7.5 – Correct listing 16. getSpecularDominantDirArea parameters N and R are float3
– Section 4.9.2 – Corrected the PDF of the specular BRDF and equations from 48 to 60. They had missing components or mistakes. The code was correct.
– Section 4.9.3 – Correct listing 21/22/23. getSpecularDominantDir parameters N and R are float3. getDiffuseDominantDir parameters N and V are float3
– Section 4.9.5 – Added and update comment about reflection composition: The composition weight computation for medium range reflections was causing darkening if several local light probes were overlapping. The previous algorithm was considering that each local light probes visibility was covering a different part of the BRDF lobe (having 10 overlapping local light probes of 0.1 visibility result in 1.0). The new algorithm considers that it covers the same part of the BRDF lobe (Adding 10 overlapping local light probes of 0.1 visibility result in 0.1).
– Section 4.10.2 – Corrected listing 26. Roughness and smoothness were inverted. The listing have been updated and an improve formula have been provided. Figure 65 has been updated accordingly.
– Section 4.10.2 – Added a reference to “Is Accurate Occlusion of Glossy Reflections Necessary” paper.
– Section 5.2 – Table~\ref{tab:SmallFloat}: Fixed wrong largest value for 14-bit float format. 16-bit float format is a standard floating point format with implied 1 on the mantissa. Max exponent for 16-bit float is 15 (not 16, because 16 is reserved for INF). Largest value is (1+m)^{maxExp} = (1+\frac{1023}{1024})*2^{15} = 65504. Whereas 14-bit float format has no leading 1, but a max exponent of 16. Largest value is m*2^{maxExp} = (\frac{511}{512})^{16} = 65408. 10-bit and 11-bit float format follow same rules as 16-bit float format.

v3 Update:
– Section 5.1.1 – Fix equation 67

Few notes

Last year I was giving a talk about Remember Me at GDCEurope 2013 : The art and rendering of Remember Me. And I was saying: Read more of this post

Inverse trigonometric functions GPU optimization for AMD GCN architecture

Version : 2.0 – Living blog – First version was 01 December 2014

First advice anybody have regarding inverse trigonometric functions (acos, asin, atan) is “do not use it”. And this is a good advise. It is often possible to get rid of all the trigonometric functions with trigonometric identities [1][2]. However with the growing complexity of lighting models I see more and more usage of them. One of the main use case I met is when I deal with solid angle calculation and area lights. So if we need to manipulate such functions, better to be aware of their cost. This post is about knowing the cost of GPU inverse trigonometric function and providing optimize version for the AMD GCN architecture (PS4, XBone, PC – AMD). For this post I have use the PS4 shader compiler (v2.00) for the analysis.

AMD GCN architecture basics

There is already plenty of good information available on the web about the AMD GCN architecture, so I will not repeat them here [3][4][5]. I recommend to read the excellent talk of Michal Drobot about “Low level optimization for GCN” [3] as I will mainly follow its vocabulary.

Main basics:
– instruction are classify into vector instructions v_ and scalar instruction s_. The scalar instruction can be coarsely consider as free as they are executed in parallel.
– instruction are full rate or quater rate, i.e this is equivalent to say there is instruction which are 4x slower than other. Full rate (FR): mul, mad, add, sub, and, or, bit shift… Quater rate(QR): transcendental instruction like rcp, sqrt, rsqrt, cos, sin, log, exp…
– macro instructions can expand to several instructions: tan, acos, asin, atan, pow, sign, length…
– there is free modifier: saturate, abs, negate, mul2, mul4, mul8, div2, div4…
– dynamic branching can be considering having cost >= 16 FR.
– VGPR count are more important than instruction count

Cost of inverse trigonometric function

How expensive is an inverse trigonometric function ?

On PS4 I get these numbers using the following code and isolating the instructions related to the function itself

float val;

float4 main() : S_TARGET_OUTPUT0
    float res = acos(val);
    return float4(res, res, res, res);

acos: 48 FR (40 FR, 2 QR), 2 DB, 12 VGPR
asin: 48 FR (40 FR, 2 QR), 2 DB, 1 scalar instruction, 12 VGPR
atan: 23 FR (19 FR, 1 QR), 2 scalar, 8 VGPR

I do not report the asm listing of these functions to avoid to overcharge the post with useless code.
The number 48 for acos is the equivalent full rate cost of the sum of full rate and quarter rate instructions.
To be fair, the PS4 implementation of acos/asin use a dynamic if to select between negative and positive value but the code in both branch is identical and only differ by the sign. So in reality the runtime cost is rather half of this, like 24FR + 1 DB. Still it bloat the shader code and cause increase of VGPR.
GPU compiler have a generic and accurate implementation of inverse trigonometric functions.  But we are free to use what we desire and are not force to use the GPU compiler version as they are only macro and not hardware instruction. I have decided to also provide the cost of the Cg reference implementation of these functions [6]. So the following cost are from explicit implementation of the function with the code provide in the Cg documentation:

acos: 19 FR(14 FR, 1 QR), 4 VGPR
asin: 18 FR(13 FR, 1 QR), 4 VGPR, 1 scalar instruction
atan: 23 FR (19 FR, 1 QR), 2 scalar, 8 VGPR

The difference between Cg version and macro version come from the “dynamic if” which is no more present in the Cg version. I also suppose that the Cg version is less accurate (Documentation said absolute error <= 6.7e-5). Still regarding these numbers, we get that inverse trigonometric function are expensive. As a game developer we know which accuracy and which range of values we need to support and we can tune functions to fit our need and reduce the cost.

Read more of this post