May 27, 2021, 17:15 (GMT) |
Cleanup: refactor shader sorting to support it for more kernels later |
May 27, 2021, 14:34 (GMT) |
Cycles X: Remove unused denoising flags from KernelFilm |
May 27, 2021, 13:48 (GMT) |
CMake: Remove unused WITH_CYCLES_DEBUG option |
May 27, 2021, 13:48 (GMT) |
Cycles X: Cleanup OptiX Curves API flag - Use proper boolean prefix - Log as a human-readable boolean - Add description for Python property |
May 27, 2021, 13:37 (GMT) |
Cycle X: Make OptiX debug flag runtime Allows to enable OptiX module debugging without having special build of Blender. Note that depending on compilation flags on developer environment this could affect render times. |
May 27, 2021, 13:14 (GMT) |
Cycles X: Remove BVH and bounces debug passes Those are only available when Cycles is compiled with special flag, and do not work on all configurations. For the simplicity removing them. If we need something like this, better to implement it in a way that is available for official builds as well. |
May 27, 2021, 10:16 (GMT) |
Fix missing update enabling active pixels overlay in Cycles-X |
May 27, 2021, 10:13 (GMT) |
Merge branch 'master' into cycles-x |
May 26, 2021, 09:26 (GMT) |
Merge branch 'master' into cycles-x |
May 21, 2021, 18:04 (GMT) |
Cycles X: Experiment with tile reschedule heuristic The idea is to add new tiles for rendering when the GPU starts to feel hungry (as opposite of previous logic which was adding new work tiles once the number of paths goes below certain threshold). Some motivation behind this decision: - There is only that many threads the GPU has. Having much more active threads might avoid some scheduling latency, but there is limit to how much it helps. - Scheduling new tiles early on might have negative effect on coherency, so allowing more paths to be terminated before re-scheduling keeps the wavefront more coherent and efficient to be calculated. The new code will use maximum number of threads the GPU has. ``` new old(1) cycles-x(2) megakernel(3) bmw27.blend 10.2251 10.198 10.7419 10.4269 classroom.blend 15.8454 16.7821 17.2907 16.6609 pabellon.blend 9.34677 9.39898 9.61772 9.14966 monster.blend 10.374 10.5923 10.5886 12.0106 barbershop_interior.blend 11.5124 11.777 11.8522 12.5769 junkshop.blend 15.6783 16.085 16.2821 16.5213 pvt_flat.blend 16.3432 16.5704 16.2637 17.4047 [1] cycles-x branch, previous commit e0716af1a4f (2) cyclex-x branch hash ad81074fab1 (3) cyclex-x branch hash ef6ce4fa8ca (right before disabling megakernel) ``` |
May 21, 2021, 13:28 (GMT) |
Cycles X: Align kernels of existing and new paths Only enqueue new kernels when the existing wavefront is at the intersect closest stage. This seems to positively affect on the coherency, gaining performance: ``` new cycles-x(1) megakernel(2) bmw27.blend 10.198 10.6995 10.4269 classroom.blend 16.7821 17.2352 16.6609 pabellon.blend 9.39898 9.65984 9.14966 monster.blend 10.5923 10.5799 12.0106 barbershop_interior.blend 11.777 11.8852 12.5769 junkshop.blend 16.085 16.2971 16.5213 pvt_flat.blend 16.5704 16.3189 17.4047 (1) cyclex-x branch hash ad81074fab1 (2) cyclex-x branch hash ef6ce4fa8ca (right before disabling megakernel) ``` While the pvt_flat (with adaptive sampling) is 1% slower, some other scenes has performance gained almost all the way back in comparison to the Cycles-X before megakernel removal. Note that coherency is a hypothesis. Performance gain might also be caused by less active paths array calculations. |
May 21, 2021, 10:34 (GMT) |
Merge branch 'master' into cycles-x |
May 19, 2021, 17:44 (GMT) |
Cycles X: remove unused megakernel for GPU rendering This reduces OptiX runtime compilation time to less than a second here. Differential Revision: https://developer.blender.org/D11313 |
May 19, 2021, 17:16 (GMT) |
Cycles X: Remove usage of mega-kernel The usage of the mega-kernel is commented out with this change. There are few benefits of removing the mega-kernel: - It takes extra time to compile and space to ship. - It is not compatible with features like shadow catcher. The rest of the changes are related on attempt to avoid performance loss in various scenes. Those changes include: - Make work tile smaller in size. This makes the work tile more friendly for greedy scheduling when adaptive sampling is used. Currently this is achieved by keeping pixel same the same and lowering number of samples per work tile. The idea behind this is to avoid dramatic change in order in which pixels are scheduled for sampling. - Keep tile size dimensions a power of two. This lowers the unused path states (which can be watched with ./bin//blender --debug-cycles --verbose 3 2>&1 | grep "Number of unused path states" In own tests it seems that we barely "waste" path states now. - Make it so tiles are scheduled in the order of samples first. As in: keep pixel-space coherency, similar to how it is done in the `get_work_pixel()`. - Only keep extreme case tests for the tile size calculation. Avoids some unnecessary updates, while still ensuring correct behavior in extremes. The timing goes as following: ``` RTX 6000 (Turing) new cycles-x bmw27.blend 10.8964 10.4269 classroom.blend 17.4476 16.6609 pabellon.blend 9.77167 9.14966 monster.blend 10.3662 12.0106 barbershop_interior.blend 11.9445 12.5769 junkshop.blend 16.3556 16.5213 pvt_flat.blend 16.5317 17.4047 RTX A6000 (Ampere) new cycles-x bmw27.blend 7.74059 7.65293 classroom.blend 10.775 10.9143 pabellon.blend 6.00643 5.85334 monster.blend 6.79277 8.0134 barbershop_interior.blend 8.39941 8.47159 junkshop.blend 10.4258 10.9882 pvt_flat.blend 10.2752 10.8821 ``` Not entirely happy with the results: there are some very nice speedups interleaved with some slowdown. Although, slowdown is within 5%, so hopes that we can gain it back with more tricks from the sleeves. Some thing to try: - Try lowering tile size in pixels - Try better alignment of tile size with number of threads on a multiprocessor. This change is a combined brain activity from Brecht and myself. Differential Revision: https://developer.blender.org/D11311 |
May 19, 2021, 14:09 (GMT) |
Cycles X: More flexible tile scheduling support Allow tiles with sample range smaller than the currently rendering number of samples. The previous code was trying to support it already, but it had some mistakes in the math. Currently should be no functional changes. |
May 19, 2021, 14:02 (GMT) |
Cycles X: Increase verbosity level for kernel timings Allows to more easily access overall kernel time execution without too much flood in the terminal which could affect overall timing. |
May 19, 2021, 10:12 (GMT) |
Cycles X: Run init_from_camera kernel for all tiles Avoids pointer magic which is not necessarily supported by all compute backends and allows to ensure there are no extra latency caused by multiple kernel launches. Currently this does not bring performance improvements, but this change opens doors for more compute backends and makes it possible to test different tile slicing and scheduling strategies. ``` init_all_tiles cycles-x bmw27.blend 10.3444 10.326 classroom.blend 16.476 16.6067 pabellon.blend 9.13914 9.13556 monster.blend 11.9673 11.963 barbershop_interior.blend 12.4566 12.4414 junkshop.blend 16.4764 16.491 pvt_flat.blend 17.288 17.2757 ``` Differential Revision: https://developer.blender.org/D11304 |
May 19, 2021, 08:15 (GMT) |
Cleanup: Follow naming convention for private members in Cycles X |
May 19, 2021, 08:15 (GMT) |
Cleanup: Use const qualifier in Cycles X scene parameters comparator |
May 19, 2021, 08:15 (GMT) |
Cleanup: Remove unused table offset access in Cycles X |
|