Blender Git Commits

Blender Git "temp-cycles-denoising" branch commits.

Page: 11 / 17

September 5, 2016, 02:03 (GMT)
Merge branch 'master' into soc-2016-cycles_denoising
September 3, 2016, 15:09 (GMT)
Cycles: Tweak the reduced feature space calculation
August 23, 2016, 21:01 (GMT)
Fix two compilation issues
August 23, 2016, 18:56 (GMT)
Cycles: Get rid of tile border artifacts when denoising after rendering or standalone denoising

The issue was that although all of the image is available, the prefiltering system didn't use the area outside of the
current tile, which caused visible seams.
August 23, 2016, 15:47 (GMT)
Cycles: Remove another two useless files
August 23, 2016, 15:37 (GMT)
Merge remote-tracking branch 'origin/master' into soc-2016-cycles_denoising
August 23, 2016, 15:34 (GMT)
Cycles: Fix a memory leak in the CUDA denoising code
August 23, 2016, 15:34 (GMT)
Cycles: Use separate struct for CUDA denoising storage to avoid allocating the transform memory twice
August 23, 2016, 15:33 (GMT)
Cycles: Support cross-frame denoising on CUDA
August 23, 2016, 15:32 (GMT)
Cycles: Fix Shadow prefiltering for cross-frame filtering
August 23, 2016, 15:31 (GMT)
Cycles: Fix building on Windows
August 21, 2016, 15:38 (GMT)
Cycles: Revert design_row redesign

This commit reverts fba2b77c2a12950802491c3112b3922f5805f98a since it turned out that it actually doesn't help with speed at all - I screwed up the original benchmarking...
Considering that there is no real performance difference, the increased complexity isn't worth it.
August 21, 2016, 04:07 (GMT)
Merge remote-tracking branch 'origin/master' into soc-2016-cycles_denoising
August 21, 2016, 04:06 (GMT)
Cycles: Use the correct bias and variance models for the least-squares fit and global bandwidth optimization

The approach that is used to find the global bandwidth is:
- Run the reconstruction filter for different bandwidths and estimate bias and variance
- Fit analytic bias and variance models to these bandwidth-bias/variance pairs using least-squares
- Minimize the MSE term (Bias^2 + Variance) analytically using the fitted models

The models used in the LWR paper are:
- Bias(h) = a + b*h^2
- Variance(h) = (c + d*h^(-k))/n
, where (a, b, c, d) are the parameters to be fitted, h is the global bandwidth, k is the rank and n is the number of samples.

Classic linear least squares is used to find a, b, c and d.
Then, the paper states that MSE(h) = (Bias(h)^2 + Variance(h)) is minimal for h = (k*d / (4*b^2*n))^(1/(k+4)).
Now, what is suspicious about this term is that a and c don't appear.
c makes sense - after all, its contribution to the variance is independent of h.
a, however, does not - after all, the Bias term is squared, so a term that depends on both h and a exists.

It turns out that this minimization term is wrong for these models, but instead correct when using Bias(h) = b*h^2 (without constant offset).
That model also makes intuitive sense, since the bias goes to zero as filter strength (bandwidth) does so.
Similarly, the variance model should go to zero as h goes towards infinity, since infinite filter strength would eliminate all possible noise.

Therefore, this commit changes the bias and variance models to not include the constant term any more.
The change in result can be significant - in my test scene, the average bandwidth halved.
August 21, 2016, 04:06 (GMT)
Cycles: Further improve CUDA denoising speed by redesigning the design_row

The previous algorithm was:
- Fetch buffer data into the feature vector which was in shared (faster) memory
- Use the feature vector to calculate the weight and the design_row, which was stored in local (slower) memory
- Update the Gramian matrix using the design_row

Now, the problem there is that the most expensive part in terms of memory accesses is the third step, which means that having the design_row in shared memory would be a great improvement.

However, shared memory is extremely limited - for good performance, the number of elements per thread should be odd (to avoid bank comflicts), but even going from the 11 floats that the feature vector needs to 13 already significantly hurts the occupancy.
Therefore, in order to make room for the design_row, it would be great to get rid of the feature vector.

That's the first part of the commit: By changing the order in whoch the design_row is built, the first two steps can be merged so that the design_row is constructed directly from the buffer data instead of going through the feature vector.
This has a disadvantage - the old design_row construction had an early-abort for zero weights, which was pretty common. With the new structure, that's not possible anymore. However, this is less of a problem on GPUs due to divergence - in order to save any speed, all 32 threads in the warp had to abort anyways.

Now the feature vector doesn't take up memory anymore, but the design_row is still to big - it has up to 23 elements, which is far too much.
It has a useful property, though - the first element is always one, and the last 11 elements are just the squares of the first 11. So, storing 11 floats is enough to have all information, and the squaring can be performed when the design_row is used.
Therefore, the second part of the commit adds specialized functions that accept this reduced design_row and account for these missing elements.
August 21, 2016, 04:06 (GMT)
Cycles: Fix various issues with the denoising debug passes
August 21, 2016, 04:05 (GMT)
Cycles: Fix undefined filter strength when using standalone denoising
August 21, 2016, 04:05 (GMT)
Cycles: Fix wrong sample variance variance calculation

The missed factor caused the NLM filtering of the buffer variance to essentially reduce to a simple box filter,
which overblurred the buffer variance and therefore caused problems with sharp edges in the shadow buffer.
August 21, 2016, 04:05 (GMT)
Cycles: Fix wrong offset for feature matrix norm calculation
August 21, 2016, 04:04 (GMT)
Cycles: Add debugging option to CUDA for switching between large L1 cache or large shared memory
By: Miika HämäläinenLast update: Nov-07-2014 14:18MiikaHweb | 2003-2021