LogLuv Encoding for HDR

Okay so this marks the third time I’ve posted this blog entry somewhere.  What can  I say…I like it!  I also think it’s something useful for just about anyone trying to do HDR on the 360 through XNA, and I’m hoping some people will stumble upon it.

Designing an effective and performant HDR implementation for my game’s engine was a step that was complicated a bit by a few of the quirks of running XNA on the Xbox 360.  As a quick refresher for those who aren’t experts on the subject, HDR is most commonly implemented by rendering the scene to a floating-point buffer and then performing a tone-mapping pass to bring the colors back into he visible range. Floating-point formats (like A16B16G16R16F, AKA HalfVector4) are used because their added precision and floating-point nature allows them to comfortbly store linear RGB values in ranges beyond the [0,1] typically used for shader output to the backbuffer, which is crucial as HDR requires having data with a wide dynamic range. They’re also convenient, as this it allows values to be stored in the same format they’re manipulated in the shaders. Newer GPU’s also support full texture filtering and alpha-blending with fp surfaces, which prevents the need for special-case handling of things like non-opaque geometry. However as with most things, what’s convient is not always the best option. During planning, I came up with the following list of pro’s and con’s for various types of HDR implementations:

Standard HDR, fp16 buffer +Very easy to integrate (no special work needed for the shaders) +Good precision +Support for blending on SM3.0+ PC GPU’s +Allows for HDR bloom effects -Double the bandwidth and storage requirements of R8G8B8A8 -Weak support for multi-sampling on SM3.0 GPU’s (Nvidia NV40 and G70/G71 can’t do it) -Hardware filtering not available on ATI SM2.0 and SM3.0 GPU’s -No blending on the Xbox 360 -Requires double space in framebuffer on the 360, which increases the number of tiles needed

HDR with tone-mapping applied directly in the pixel shader (Valve-style) +Doesn’t require output to an HDR format, no floating-point or encoding required +Multi-sampling and blending is supported, even on old hardware -Can’t do HDR bloom, since only an LDR image is available for post-processing -Luminance can’t be calculated directly, need to use fancy techniques to estimate it -Increases shader complexity and combinations

HDR using an encoded format +Allows for a standard tone-mapping chain +Allows for HDR bloom effects +Most formats offer a very wide dynamic range +Same bandwidth and storage as LDR rendering +Certain formats allow for multi-sampling and/or linear filtering with reasonable quality -Alpha-blending usually isn’t an option, since the alpha-channel is used by most formats -Linear filtering and multisampling usually isn’t mathmatically correct, although often the results are “good enough” -Additional shader math needed for format conversions -Adds complexity to shaders

My early prototyping used a standard tone-mapping chain and I didn’t want to ditch that, nor did I want to move away from what I was comfortable with.  This pretty much eliminated the second option for me off the bat…although I was unlikely to choose it anyway due its other drawbacks (having nice HDR bloom was something I felt was an important part of the look I wanted for my game, and in my opinion Valve’s method doesn’t do a great job of determining average luminance).  When I tried out the first method I found that it worked as well as it always did on the PC (I’ve used it before), but on the 360 it was another story.  I’m not sure why exactly, but for some reason it simply does not like the HalfVector4 format.  Performance was terrible, I couldn’t blend, I got all kinds of strange rendering artifacts (entire lines of pixels missing), and I’d get bizarre exceptions if I enabled multisampling. Loads of fun, let me tell you.

This left me with option #3.  I wasn’t a fan of this approach initially, as my original design plan called for things to be simple and straightforward whenever possible.  I didn’t really want to have two versions of my material shaders to support encoding, nor did I want to integrate decoding into the other parts of the pipeline that needed it.  But unfortunately, I wasn’t really left with any other options after I found there were no plans to bring the support for the 360’s special fp10 backbuffer format to XNA (which would have conveniently solved my problems on the 360).  So, I started doing my research.  Naturally the first place I looked was to actual released commercial game.  Why?  Because usually when a technique is used in a shipped game, it means it’s gone though the paces and has been determined to actually be feasible and practical in game environment.  Which of course naturally led me to consider NAO32.

NAO32 is a format that gained some fame in the dev community when ex-Ninja Theory programmer Marco Salvi shared some details on the technique over on the beyond3D forums.  Used in the game Heavenly Sword, it allowed for multisampling to be used in conjuction with HDR on a platform (PS3) whose GPU didn’t support multisampling of floating-point surfaces (The RSX is heavily based on Nvidia G70).  In this technique, color is stored in the LogLuv format using a standard R8G8B8A8 surface.  Two components are used to store X and Y at 8-bit precision, and the other two are used to store the log of luminance at 16-bit precision.  Having 16 bits for luminance allows for a wide dynamic range to be stored in this format, and storing the log of the luminance allows for linear filtering in multisampling or texture sampling.  Since he first explained it other games have also used it, such as Naughty Dog’s Uncharted.  It’s likely that it’s been used in many other PS3 games, as well.

My actual shader implementation was helped along quite a bit by Christer Ericson’s blog post, which described how to derive optimized shader code for encoding RGB into the LogLuv format.  Using his code as a starting point, I came up with the following HLSL code for encoding and decoding:

// M matrix, for encoding

const static float3x3 M = float3x3(
    0.2209, 0.3390, 0.4184,
    0.1138, 0.6780, 0.7319,
    0.0102, 0.1130, 0.2969);

// Inverse M matrix, for decoding
const static float3x3 InverseM = float3x3(
    6.0013,    -2.700,    -1.7995,
    -1.332,    3.1029,    -5.7720,
    .3007,    -1.088,    5.6268);   

float4 LogLuvEncode(in float3 vRGB)
{
    float4 vResult;
    float3 Xp_Y_XYZp = mul(vRGB, M);
    Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
    vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
    float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
    vResult.w = frac(Le);
    vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
    return vResult;
}

float3 LogLuvDecode(in float4 vLogLuv)
{
    float Le = vLogLuv.z * 255 + vLogLuv.w;
    float3 Xp_Y_XYZp;
    Xp_Y_XYZp.y = exp2((Le - 127) / 2);
    Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
    Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
    float3 vRGB = mul(Xp_Y_XYZp, InverseM);
    return max(vRGB, 0);
}

Once I had this implemented and worked through a few small glitches;, results were much improved in the 360 version of my game. Performance was much much better, I could multi-sample again, and the results looked great. So while things didn’t exactly work out in an ideal way, I’m pleased enough with the results.

If you’re interested in this, be sure to check out my sample


Comments:

Takuan -

I also think it’s something useful for just about anyone trying to do HDR on the 360 through XNA, and I’m hoping some people will stumble upon it.

I didn’t exactly stumble on it, I found your blog through GameDev.net, but it is indeed useful!


#### [matt77hias]( "matthias.moulin@gmail.com") -

Thanks!


#### [matt77hias]( "matthias.moulin@gmail.com") -

Two small questions: 1) What is the reasoning behing your “Xp_Y_XYZp” local variable name? 2) What do the matrices M and inverseM convert to/from? It is not just linear to LogLUV and vice versa since the conversion isn’t finished yet?


#### [MJP](http://mynameismjp.wordpress.com/ "mpettineo@gmail.com") -
  1. The code was taken from Christer’s blog post, so he’s the one who came up with the variable names. If you read through the linked post, he explains how he simplified the math into what’s represented in the code. He calls it that because one of the intermediate steps is converting to CIE XYZ, and he folds a dot product into the matrix multiply to produce X, Y, and the dot product result (that he calls ‘XYZ’). He also folds in the multiplication of two other constants, to produce what he calls X` and XYZ`, hence the ‘p’ in the variable name (short for ‘prime’, I assume). 2) See above. It’s a combination of converting from RGB -> XYZ, and some of the related math for converting to LogLuv.

#### [matt77hias]( "matthias.moulin@gmail.com") -

Sorry for bothering you with this again, but I read the article of Christer which started with a linear RGB to XYZ conversion and sequentially added extra operations to the conversion matrix M. I am not very familiar with display formats, so feel free to correct me. The CCIR became the ITU-R, so I presume that “CCIR 709” is the same as “ITU-R Recommendation BT.709” (a.k.a. “Rec. 709” or “BT.709”)? If this is the case, the used RGB-to-XYZ conversion (cfr. paper by Gregory Ward) is different from the one used in Real-Time Rendering (3th edition), pbrt-v2, pbrt-v3 and Mitsuba (which all use the same coefficients)? Thanks in advance.