Feb 18, 2015

Shadow Sample Update

Update 1/24/2016: one of the authors of the Moment Shadow Mapping paper contacted to let me know that there was an issue in my implementation of the 16-bit variant of EVSM. My sample app was clamping the maximum exponential warp factor to 10.0, which can result in overflow for a 16-bit float. This has the effect of reducing light bleeding, but it also causes edge quality to suffer during filtering. This made the light bleeding comparison with MSM16 unfair, particularly since my comparisons did not use high filtering settings. The version of the sample that’s now uploaded to GitHub has corrected this issue, and I’ve generated a new set of comparison images. I’ve also updated my commentary on these comparisons to better reflect MSM’s improvements over EVSM.

This past week a paper entitled “Moment Shadow Mapping” was released in advance of its planned publication at I3D in San Francisco. If you haven’t seen it yet, it presents a new method for achieving filterable shadow maps, with a promised improvement in quality compared to Variance Shadow Maps. Myself and many others were naturally intrigued, as filterable shadow maps are highly desirable for reducing various forms of aliasing. The paper primarily suggests two variants of the technique: one that directly stores the 1st, 2nd, 3rd, and 4th moments in a RGBA32_FLOAT texture, and another that uses an optimized quantization scheme (which essentially boils down to a 4x4 matrix transform) in order to use an RGBA16_UNORM texture instead. The first variant most likely isn’t immediately interesting for people working on games, since 128 bits per texel requires quite a bit of memory storage and bandwidth. It’s also the same storage cost as the highest-quality variant of EVSM (VSM with an exponential warp), which already provides high-quality filterable shadows with minimal light bleeding. So that really leaves us with the quantized 16-bit variant. Using 16-bit storage for EVSM results in more artifacts and increased light bleeding compared to the 32-bit version, so if MSM can provide better results than it could potentially be useful for games.

I was eager to see the results myself, so I downloaded the sample app that the authors were kind enough to provide. Unfortunately their sample didn’t implement EVSM, and so I wasn’t able perform any comparisons. However the implementation of MSM is very straightforward, and so I decided to just integrate it into my old shadows sample. I updated the corresponding blog post and re-uploaded the binary + source, so if you’d like to check it out for yourself then feel free to download it from GitHub:

https://github.com/TheRealMJP/Shadows/releases

The MSM techniques can be found under the “Shadow Mode” setting. I implemented both the Hamburger and Hausdorff methods, which are available as two separate shadow modes. If you change the VSM/MSM format from 32-bit to 16-bit, then the optimized quantization scheme will be used when converting from a depth map to a moment shadow map.

The 32-bit variant of MSM seems to provide quality that’s pretty close to the 32-bit variant of EVSM, with slightly worse performance. Both techniques are mostly free of light bleeding, but still exhibit bleeding artifacts for the more extreme cases. The 16-bit variant initially has some of the same issues as the 16-bit version of EVSM, in that it will typically result in artifacts due to a lack of precision in the storage format. Both EVSM16 and MSM16 can reduce or eliminate these artifacts by increasing their respective bias values, but raising them too high can result in a loss of shadow details in areas where the receiver is very close to the occluder. As for light bleeding, there’s bad news and there’s good news. The bad news is that MSM16 does suffer from increased bleeding compared to the MSM32, which is unfortunate. The good news that it ultimately fares better than EVSM16, which makes for a solid improvement with similar performance and storage cost. In fact, I’ve found that bleeding can be almost avoided entirely in my test scenes by using the “standard” VSM bleeding reduction technique, which essentially just remaps the output range of the shadow visibility term. This technique also works for EVSM16, but in my test scenes this technique is not enough to entirely remove the bleeding for all cases. I gathered some images so that can compare for yourself:

The EVSM images all use the 4-component variant, while the MSM images all use the 4-moment Hamburger variant. For the images with the bleeding “fix’, they use a reduction factor of 0.25. In all cases the shadow map resolution is 2048x2048, with 4xMSAA, 16x anisotropic filtering, and mipmaps enabled for both EVSM and MSM. The first MSM16 image shows what happens if you don’t use a moment bias, while the next one shows what it looks like with a moment bias of 0.030. The last image shows MSM16 with a light bleeding reduction applied, and as you can see the artifacts are almost completely gone. Compare it with the EVSM16 image, where some of the bleeding remains even when applying the reduction factor.

Here’s a few more images from another area with light bleeding:

EVSM16 and MSM16 both show bleeding artifacts by default, but the bleeding is reduced quite a bit with MSM16. The bleeding reduction works very well for both techniques for this particular viewpoint, but with MSM16 we can probably get away with a lower bleeding reduction factor.

To finish up, here are some images from another particularly troublesome case: a close up on a character, with heavy filtering enabled.

This is a pretty tough situation for all shadow techniques. The character is very close, and has quite a few small details that need shadows in order for them to read properly. However there’s still quite a bit of things in the background that need shadowing, and so the cascades need to cover a range beyond just the character. SDSM helps quite a bit here, since it ensures that we can create a logarithmic partioning of the viewable area for close-to-ideal distribution of our shadow map resolution. Even with that in place, our 9x9 PCF kernel still has some trouble. In the first image, the issues mostly stem from incorrect biasing. The bias factor in that image is computed “automatically” by determining the slope of the receiver using pixel shader derivatives, which is often quite effective. However it assumes that the receiver is planar, and this assumption is violated when the filter footprint covers an edge between two triangles that aren’t coplanar. Switching to a “fixed” bias factor causes the artifacts to go away, but we then lose shadow detail in areas like the nose. EVSM and MSM also have issues due using a wider filter kernel, which requires increasing their bias factors in order to avoid artifacts at triangle boundaries. Even with an increased bias, EVSM16 still shows some precision artifacts in places where the occluder is rather close to the receiver (check out the nose, or the shadow under the collar). MSM16 also has some precision issues, at least initially. To fix this, I increased the “depth bias” parameter to 0.4, which directly biases the computed pixel depth. After doing that and applying a little bit of bleeding reduction, we end up with some very nice results!

If you’d like to look a the full captures that I used to generate these comparison images, I uploaded the source .PSD files here: https://mynameismjp.files.wordpress.com/2016/01/msm-comparisons.zip

Comments:

- Feb 3, 2015

Hi Thank you for this great update and also so many thanks for nvidia driver bug fix!!! Unfortunately my 770 gpu has a problem with 16 and 32 bit Hausdorff method. http://postimg.org/image/ou0341izb/

#### [Michael]( "segura.mikael@orange.fr") - Feb 6, 2015

I have the same problem and using the 16-bits MSM I have lot of dot on the shadow. Another thing I saw is you set params on cascade shadow map, the original SDSM does all auto the split, is it safer to param or you just did that to win time on the demo ?

#### [MJP](http://mynameismjp.wordpress.com/ "mpettineo@gmail.com") - Feb 0, 2015

Hi Michael, If you’re referring to the parameters for selecting the min and max depth ranges for the cascades as well as the cascade split depths, for all of my comparisons I was using the “Auto Compute Depth Bounds” option which automatically computes the min and max visible depth from the depth buffer in a manner similar to SDSM. I also used the logarithmic partitioning scheme, which also automatically computes the split depths.

#### [Michael]( "segura.mikael@orange.fr") - Feb 0, 2015

Hi MJP, Thanks to have explained that, so the “Auto Compute Depth Bounds” + “logarithmic partitioning scheme” is the full auto. I have a question about the “Depth Bounds Readback Latency”, is it really needed to have a latency, for performance, which gives possible visual error on low FPS ? Screenshot of the 16-bits dot issue : http://zupimages.net/up/15/09/1s75.png

#### [Michael]( "segura.mikael@orange.fr") - Feb 0, 2015

About the dot, just using a bias of 0.022 removes them. Only some are visible on intersection apparently, which is surely because of the mesh, you can confirm that surely.

#### [Update For My Shadow Sample Update – The Danger Zone](https://mynameismjp.wordpress.com/2016/01/24/update-for-my-shadow-sample-update/ "") - Jan 0, 2016

[…] I was contacted by Christoph Peters (one of the authors of Moment Shadow Mapping) regarding a blog post where I compared EVSM to MSM using my sample app. He noticed that I was incorrectly clamping the […]

#### [unbird]( "ugello@gmx.net") - May 1, 2016

First, thanks for another great read and source. Again :) I hope you don’t mind going off topic but I noticed something peculiar in GPUBatch.hlsl. There’s a SV_GroupIndex for the DrawArgsBuffer. My google-fu is failing me here so… is this another undocumented feature (like SV_DepthGreaterEqual) ? What does it improve ? On what hardware ?

#### [MJP](http://mynameismjp.wordpress.com/ "mpettineo@gmail.com") - May 1, 2016

The docs for SV_GroupIndex are here: https://msdn.microsoft.com/en-us/library/windows/desktop/ff471569(v=vs.85).aspx It’s basically just a shortcut for flattening the group thread ID into a 1D value. I don’t think it’s actually any faster, in fact driver’s compiler might even insert some math at the beginning of the shader in order to generate it. It’s also not at all necessary in that particular shader since the group size is 1D. I think I just used it out of habit.

#### [unbird]( "ugello@gmx.net") - May 1, 2016

:D I know what it does for shader input. Sorry, maybe I wasn’t quite clear. I meant this line: RWByteAddressBuffer DrawArgsBuffer : SV_GroupIndex : register(u0); It’s legal to put semantics on globals, I just never saw a SV semantic this way and thought it would give the driver a hint or something. Other than turning up in the shader reflection I don’t think they have any other impact. At least as far as I know ;) Thanks for the fast reply.

#### [MJP](http://mynameismjp.wordpress.com/ "mpettineo@gmail.com") - May 1, 2016

Oh, I didn’t even see that! I’m pretty sure that’s just some kind of search/replace gone badly, and it definitely wasn’t intentional. I don’t think it will have any effect, but I’m definitely going to remove it when I get a chance. Thank you for pointing that out!