Inverse trigonometric functions GPU optimization for AMD GCN architecture

Version : 2.0 – Living blog – First version was 01 December 2014

First advice anybody have regarding inverse trigonometric functions (acos, asin, atan) is “do not use it”. And this is a good advise. It is often possible to get rid of all the trigonometric functions with trigonometric identities [1][2]. However with the growing complexity of lighting models I see more and more usage of them. One of the main use case I met is when I deal with solid angle calculation and area lights. So if we need to manipulate such functions, better to be aware of their cost. This post is about knowing the cost of GPU inverse trigonometric function and providing optimize version for the AMD GCN architecture (PS4, XBone, PC – AMD). For this post I have use the PS4 shader compiler (v2.00) for the analysis.

AMD GCN architecture basics

There is already plenty of good information available on the web about the AMD GCN architecture, so I will not repeat them here [3][4][5]. I recommend to read the excellent talk of Michal Drobot about “Low level optimization for GCN” [3] as I will mainly follow its vocabulary.

Main basics:
– instruction are classify into vector instructions v_ and scalar instruction s_. The scalar instruction can be coarsely consider as free as they are executed in parallel.
– instruction are full rate or quater rate, i.e this is equivalent to say there is instruction which are 4x slower than other. Full rate (FR): mul, mad, add, sub, and, or, bit shift… Quater rate(QR): transcendental instruction like rcp, sqrt, rsqrt, cos, sin, log, exp…
– macro instructions can expand to several instructions: tan, acos, asin, atan, pow, sign, length…
– there is free modifier: saturate, abs, negate, mul2, mul4, mul8, div2, div4…
– dynamic branching can be considering having cost >= 16 FR.
– VGPR count are more important than instruction count

Cost of inverse trigonometric function

How expensive is an inverse trigonometric function ?

On PS4 I get these numbers using the following code and isolating the instructions related to the function itself

float val;

float4 main() : S_TARGET_OUTPUT0
    float res = acos(val);
    return float4(res, res, res, res);

acos: 48 FR (40 FR, 2 QR), 2 DB, 12 VGPR
asin: 48 FR (40 FR, 2 QR), 2 DB, 1 scalar instruction, 12 VGPR
atan: 23 FR (19 FR, 1 QR), 2 scalar, 8 VGPR

I do not report the asm listing of these functions to avoid to overcharge the post with useless code.
The number 48 for acos is the equivalent full rate cost of the sum of full rate and quarter rate instructions.
To be fair, the PS4 implementation of acos/asin use a dynamic if to select between negative and positive value but the code in both branch is identical and only differ by the sign. So in reality the runtime cost is rather half of this, like 24FR + 1 DB. Still it bloat the shader code and cause increase of VGPR.
GPU compiler have a generic and accurate implementation of inverse trigonometric functions.  But we are free to use what we desire and are not force to use the GPU compiler version as they are only macro and not hardware instruction. I have decided to also provide the cost of the Cg reference implementation of these functions [6]. So the following cost are from explicit implementation of the function with the code provide in the Cg documentation:

acos: 19 FR(14 FR, 1 QR), 4 VGPR
asin: 18 FR(13 FR, 1 QR), 4 VGPR, 1 scalar instruction
atan: 23 FR (19 FR, 1 QR), 2 scalar, 8 VGPR

The difference between Cg version and macro version come from the “dynamic if” which is no more present in the Cg version. I also suppose that the Cg version is less accurate (Documentation said absolute error <= 6.7e-5). Still regarding these numbers, we get that inverse trigonometric function are expensive. As a game developer we know which accuracy and which range of values we need to support and we can tune functions to fit our need and reduce the cost.

Read more of this post

IES light format: Specification and reader

During my investigation on supporting IES lights for the Frostbite engine (With the help of Rodney Huff!) I have met several hurdles. The IES format is rather badly specified and when implementing a parser, many questions come without answer (And handling the different sysmetry is insane). This post is about helping people to find their way to support IES lights in their engine. And I will say that it is preferable to support EULUMDAT format, but sadly there is less resource on the web.

Photometric lights

There is few references about implementing photometric lights for real-time on the web. I already write several information about photometric lights, for both IES and EULUMDAT format,in the Siggraph 2014 PBR course notes of the talk Moving Frostbite to PBR. So I will not rewrite them here. Please refer to section 4.5 of the course notes. Here is some useful links:

IES and EULUMDAT viewer:
IES unofficial specification :
EULUMDAT unofficial file format:

The iesna.txt document speaking about the IES format is written by Ian Ashdown who is a member of the IES Computer Committee responsible for LM-63-02, and he also maintain the EULUMDAT file format specification on the Web. His website contain the latest information about these formats (

There use to have companion files with code for the iesna.txt document. But it is no more accessible. I get the permission from Ian Ashdown to host them here, and this is the only purpose of this blog post. The files below contain the IES parser with the C source code and some example files mention in the iesna.txt document (WordPress doesn’t handle zip file, so right-click then save target. Then rename extension to “.zip”) :


And here is some advice from Ian Ashdown:

The code was last updated in 1998. The code is still used in Ian Ashdown commercial products, so he cannot release the latest version. However, here is the (edited) change log for IES_READ.C reads:

// 98/09/12 – Fixed vertical axis symmetry determination error in ReadFile.
// 99/05/21 – Modified ReadFile to initialize horz_dist data member for luminaires with vertical axis rotational symmetry and to initialize vsymm_flag data member.
// 99/06/04 – Modified GetLine to check for buffer overflow.
// 99/08/15 – Modified ReadFile to call strncmp rather than strcmp to support lines with trailing whitespace.
// 01/08/07 – Modified ReadFile to initialize lum_dim data structure.
// 02/04/26 – Modified ReadFile to accept LM-63-2002 format files.
// 05/03/21 – Modified ReadFile to flush photometric data if error.

IESNA Type A and Type B photometric data files van be ignored – He have never encountered them in 20 years of architectural and roadway lighting work.

LM-63 specifies a maximum of 132 characters (inherited from the days of IBM Hollerith punch cards), but he have seen LM-63 files with up to 4,000 characters per line in the candela fields.

And lastly if you are generating a photometric web from the data for ray tracing or radiosity, you should interpolate the horizontal angles using a cubic spline curve (open or closed depending on whether the full 360 range of horizontal angles is specified).

Weathering and aging effects in the hand of artists

Version : 1.1 – Living blog – First version was 04 May 2014

With permission of Dontnod entertainmen

It is now frequent in game to have dynamic weathering and aging effects: Rain, snow, dirt, rust, dust, pollution… Most of the time these effects are drive by programmers because they require specific code like access to a custom shadow map or perform by a full screen space post process. Before leaving Dontnod I was working on a feature to allow artists to handle themselves these kind of effects. At the time we were requiring a lot of effects and I wanted to allow artists to prototype as much idea as they can. This post will provide explanation of this feature and will give implementation details under the Unreal Engine 4. It aim to work with a deferred rendering engine. These weathering and aging effects feature has been use at Dontnod but there is some pitfalls which may prevent to use it effectively in a shipping game. It all depends on the scale of the game and performance expected. I hope by exposing this idea that others could improve the system 🙂

I thanks Dontnod to allow me to talk about it.

 The weathering and aging effects

We know from graphic literature that a lot of weathering and aging effects can be done through surface properties modification [1]. Of course using complex multi-layered lighting model is the right way to handle it but it is way less flexible, will require a programmer to code every effects, and it will be difficult to get right regarding all supported lights type (like image based lighting or area lights).

As an example, the wet surfaces appearing when it is raining could be simulated with material properties modification: Roughness, diffuse albedo, normal… I talk a lot about this subject in other posts on this blog (see water drops series). However these modifications should be done only were it matter. When it is raining, you don’t want your interior surface to be wet. In Remember Me, we were handling this by adding some extra code in the shaders to modify the surface properties, then artists were vertex painting the part of the surfaces requiring to be dry. But this was not sufficient to handle all cases of wet surface. For example, a player walking in a wet street could, once he get into a dry interior, will let wet footprint on the ground. This could have been simulated with decals.

Taking another example, in Kill zone 4: Shadow fall [2] they perform a full screen space pass to modify the material attributes where normal are pointing up to add dust where it matter (after an explosion for example).

GBuffer modification

Thinking with a deferred renderer in mind, it is possible to identify a set of desired control for artists allowing them to modify material properties and simulating a weathering or aging effect. These controls are perform with GBuffer modification with:
– Deferred decals
– Full screen space quad. I call it material postprocess.
– Deferred effect lights: These are same as lights with shadow map or not, except they behave like deferred decal. The amount of light being use as opacity for blending operation. Soft shadow map also allow smooth transitions between effects.
– Object shaders: Properties will be modified directly at GBuffer generation time in the shader of the object. So this require specific code in each shaders. User could use vertex painting to bring information for an effect.

Every tools will perform some material properties modification to simulate an effect, like darkening the diffuse and boosting the smoothness for wet surfaces. All the GBuffer modification must be done before the lighting pass to be taken into account.

Applying the material property modification for the GBuffer modification control could be done in two ways.  Most common case is to use hardware blending, but it could be too restrictive. The other case is to use read/write ability into the same textures (a.k.a prorgammable blending). Sadly this ability is not widely supported. The PS4 and Mantle support it for example, DX11 doesn’t (I don’t talk about intel’s pixel synchronization but simple read/writing the same pixel).

Delaying the GBuffer modification

To support every platform and to be able to do any customization of material properties, I perform the modification of material properties in an extra full screen pass. Effectively delaying the GBuffer modification. Rather than modifying the GBuffer, the different tools simply output an effect weight inside the GBuffer. This effect weight is read later in the extra pass to apply the effect. There is multiple benefit to do that:
– Applying only one time the effect could save performance for heavy effect
– Accumulating effect weights could allow to clamp it in the delayed pass in order to limit the strength of an effect
– Centralized place to deal with the effect. Easier to author.
Sadly it will require to store the effect weight in the GBuffer.
Read more of this post

DONTNOD Physically based rendering chart for Unreal Engine 4

With permission of Dontnod entertainmen

I get the permission from Dontnod entertainment to release the graphic chart I, Sophie Van de Velde (Lead environment) and Laurent Harduin (Lighter) made for the artists to help them with the physically based rendering (PBR) workflow present in the Unreal engine 4. This chart is an updated version of the previous chart I post on this blog DONTNOT specular and glossiness chart.

To save the chart, just right clik and chose save link target as… The chart is a PNG file.


(Real world pictures courtesy of Andrea Weidlich from “Exploring the potential of layerer BRDF models” siggraph asia 2009)

The chart has been design for Dontnod team and with Unreal engine 4 convention for textures based on the Disney “principled” BRDF use in the Unreal engine 4 [1]. The chart is use as predefined value that the artist color pick and use when creating textures. For the different meaning of the parameters except the Porosity you can refer to the Unreal engine 4 documentation:

For the details of some values, refer to the other article from this blog like feedding a physically based shading model or GDCEurope 2013 talk : The art and rendering of Remember Me.


The color is provide in sRGB 0-255.

The diffuse part of the base color (the one use by the non-metallic) must be in the range of the first gradient 50-243. There is some sample values of real world material in sRGB below the gradient. Some of these values are base on real world measured material (from misc sources, not done by us) and other are have been generated by Laurent Harduin. He take calibrated raw picture of representative material, take the luminance histogram in Photoshop and use the value of the medium axis for the luminance. Then he blur the picture and take one pixel inside the blurred region and use that as the color value. This explain why in few case like the clean cement the color and the luminance doesn’t match perfectly. We also lower a bit the value to take into account the invevitable specular present during the capture.

The reflectance part (the one use by metallic) must be in the range 186-255 (not present in the chart). Some example are provided below the grey square. Most of the time the metallic color of material match what the eye see.


Simple monochrome linear parameter. Range is 0-1 but the gradient is from 0-255. The yellow sphere below represent in-engine capture of a sphere and a cube.


The specular part of Disney “principled” BRDF is a GGX BRDF. It use a roughness parameter. This roughness is the “Disney roughness”, not the real GGX roughness. Disney Roughness = sqrt(Roughness). When use at runtime this Disney Roughness is transform to the GGX roughness with roughness = Disney Roughness * Disney Roughness.

The gradient display roughness from 0 for smooth (left) material to 1 for rough material (right).
The grey gradient are from 0 to 255 and red segments are displayed  every 1/10 with a sphere like object below to show the in-game result of the designated value.
The first row of real world image above represent no metallic object, the second row represent metallic object. Goal is to give artist a better feeling of what is roughness.
The first row of sphere like object represent metallic object, the second row represent non-metallic object.

Note: The roughness here is coupled with the BRDF used by the Unreal engine 4, it may not be compatible with other engine or offline renderer.


The Dontnod chart include an unusual parameter named Porosity. This parameter is the “open porosity” of a material. It can be used for driving weathering and aging effect (Pollution, rain, aging…). More details on its usage can be found in previous blog post: Water drop 3a – Physically based wet surfaces and Water drop 3b – Physically based wet surfaces. In practice Dontnod use it mainly with the dynamic wet formula provided in the  mentionned previous post.

The range is remapped from 0-1 to 0-70% of open porosity. There is real worl image to try to give a feeling of what the value mean. An extremely porous material is the clay (70%), but open porosity can vary a lot for same material, clay could also be only 50%.


From Kevin Hnat thesis [2]. French translation from top to bottom. Clay, chalk, sand, limestone, granite.

Note: The porosity parameter is not used in the BRDF formulation. More research need to be done on this topic and currently there is only one porosity BRDF paper available which doesn’t fit with the current need.


There is no chart for the reflectance value of the Unreal engine 4, just let the value by default and apply a cavity map on it.

Difference with the old chart

The previous chart was based on modfied Unreal engine 3 where I introduced a new physically based renderer based on Blinn-Phong BRDF. This new chart is based on the Disney “principled” BRDF use in the Unreal engine 4 which is based on the GGX BRDF. The glossiness of the previous chart and the roughness of this one do not match. Moreover, the range is inversed. Previous specular value for metalic can be reuse for Unreal engine 4 reflectance parameter. UE4 use a mapped range of 0-1 to 0.08, so you need to do the conversion. But in practice there is so little difference that’s it is not necessary to use the precise values.


[1] Burley, “Physically based shading at Disney”,
[2] Hnat,  thesis “Influence of surfaces micro-geometry on realistic rendering” in French,

Water drop 4a – Reflecting wet world

Version : 1.0 – Living blog – First version was 08 September 2013

This is the fourth post of a series about simulating rain and its effect on the world in game. But it could be read without reading the previous post. The subject is “the reflection”.  The post is split in two parts A and B:

Water drop 1 – Observe rainy world
Water drop 2a – Dynamic rain and its effects
Water drop 2b – Dynamic rain and its effects
Water drop 3a – Physically based wet surfaces
Water drop 3b – Physically based wet surfaces
Water drop 4a – Reflecting wet world
Water drop 4b – Reflecting wet world

When a world scene is totally wet, the most striking visual cue is the reflected environment. Of course all surfaces permanently reflect their surrounding but this is more visible under rainy day. The topic of this post is “reflection”. The reflections as we see it in real world includes all the surrounding lighting. When we talk about reflection in game, too often we restrict this to water or smooth surfaces reflection. But “reflection” is just a convenient word to designate the normal lighting process. In game we separate lighting as direct, indirect and emissive. If you handle direct and indirect lighting on any kind of surfaces from smooth to rough, you have your reflections. There is no need of a particular process for it.
For Remember Me we decided to go this way. To get a good rainy mood, we were looking for having reflection everywhere on every surface. For example, we use the same process to get reflection on rocks as well as in puddles.

Reflection – Theory

The observation post already presents many pictures illustrating reflection. But I will present some others here to highlights some characteristic of reflections.

Reflection with smooth surfaces

Let’s consider à perfectly smooth surface. Most people think that the reflection of a scene in surface like calm water or mirror is the scene itself upside down.


But this is a really wrong assumption. The reflection depends on the distance from reflected objects and the viewer’s position.


The differences become smaller, the closer we bring our eyes to the reflecting surfaces and the farther away the objects are. On the pictures below, see how Mickey Mouse is hidden by the blue cow until you reach a glazing angle with the mirror.


Read more of this post