April 2, 2021, 15:23 (GMT) |
Cycles: Remove unused defines in CUDA device |
April 2, 2021, 15:23 (GMT) |
Cycles: Enable adaptive sampling tests Got lost in one of the previous changes: they were disabled during development. |
April 2, 2021, 15:22 (GMT) |
Cycles: restore fine grained bounce depth controls Unsure if we want to keep all of this in the end, but useful now for more accurate performance comparisons. |
April 2, 2021, 14:27 (GMT) |
Cycles: Scale all passes with adaptive samples count Do it in the RenderBuffers, similar to how combined pass was handled. While the scaling with samples count is slower than doing it on the device, the scaling with samples count is still to be performed. In practice the downside is that per-pixel inverse sample count is now done from a single thread. However, this can be improved by using tbb to scale the pass from multiple threads. The benefit of this approach is that the pass scaling kernel is not needed, which solves ambiguity about when to run it. It also simplifies code in a sense that the scaling logic is not duplicated in the kernel. |
April 2, 2021, 14:27 (GMT) |
Cycles: Allow const-pointer access to device_memory |
April 2, 2021, 14:22 (GMT) |
Cycles: reduce size of shadow path state by moving to own struct |
April 2, 2021, 14:22 (GMT) |
Cleanup: don't put unused CUDA KernelGlobals on the stack |
April 2, 2021, 14:22 (GMT) |
Cycles: use SoA layout for IntegratorState on the GPU |
April 2, 2021, 14:22 (GMT) |
Cycles: put integrator queue pointer in constant memory Instead of passing it around to various functions. |
April 2, 2021, 14:22 (GMT) |
Cycles: remove INTEGRATOR_STATE_COPY macro This will be tricky with SoA, just do it manually in the one place that uses it. |
April 2, 2021, 13:32 (GMT) |
Fix error in Cycles versioning code after removal of branched path |
April 2, 2021, 13:27 (GMT) |
Fix error in removal of NLM denoiser causing CUDA failures |
April 2, 2021, 12:12 (GMT) |
Cycles: Speedup adaptive sampling on CPU - Do early output in the convergence test, to avoid error calculation for pixel which is known to be converged. - Better threading scheduling in the path trace work, to avoid extra call of `parallel_for` which has (unmeasurable) overhead. - Biggest change is to stop parallel samples for pixel once it did converge. Prior to this change the path trace work will attempt to initialize path state for many samples in a row for this pixel. Timing on sampling simple file (diffuse monkey on diffuse plane): - master is 3.89 sec - before this change: 4.24 sec - after this change: 4.04 sec |
April 1, 2021, 17:14 (GMT) |
Cycles: Move need display update check to own function Currently no functional changes, but allows to have more elaborate logic in its implementation now. |
April 1, 2021, 17:14 (GMT) |
Cycles: Implement convergence and filtering kernels for CUDA On user level this means that adaptive sampling works on CUDA. Missing part for all devices is passes scaling at the end of render. Need to look whether there is some smarter trick we can do. |
April 1, 2021, 17:14 (GMT) |
Cycles: Use better naming in RenderScheduler No functional changes. |
April 1, 2021, 17:14 (GMT) |
Cycles: Remove hardcoded CUDA functions from CUDADevice Use more dynamic and flexible CUDADeviceKernels. |
April 1, 2021, 17:14 (GMT) |
Cycles: More accurate time tracking for display update |
April 1, 2021, 17:14 (GMT) |
Cycles: Move adaptive sampling convergence test to own kernel This change makes it so the check is only done after all contribution is known at the sample. It also makes it so path tracing kernel is faster because it doesn't do per-bounce convergence test. Makes it so that CPU samples distribution works similar to the master branch. The timing on CPU is way closer to the master branch, but more accurate measurement with proper compilation flags is needed. GPU adaptive sampling is temporarily broken. |
April 1, 2021, 17:14 (GMT) |
Cycles: Better display update scheduling for adaptive sampling Adaptive sampling forces maximum number of samples per render work in order to have all filtering points to happen. On a simple scenes this could lead to very often updates, which heavily degrades performance. Now the rendering scheduler will ignore display updates if they happen too often even in the case when adaptive sampling is used. |
|