Blender Git Loki

Blender Git "cycles-x" branch commits.

Page: 21 / 44

May 27, 2021, 17:15 (GMT)
Cleanup: refactor shader sorting to support it for more kernels later
May 27, 2021, 14:34 (GMT)
Cycles X: Remove unused denoising flags from KernelFilm
May 27, 2021, 13:48 (GMT)
CMake: Remove unused WITH_CYCLES_DEBUG option
May 27, 2021, 13:48 (GMT)
Cycles X: Cleanup OptiX Curves API flag

- Use proper boolean prefix
- Log as a human-readable boolean
- Add description for Python property
May 27, 2021, 13:37 (GMT)
Cycle X: Make OptiX debug flag runtime

Allows to enable OptiX module debugging without having special build
of Blender.

Note that depending on compilation flags on developer environment this
could affect render times.
May 27, 2021, 13:14 (GMT)
Cycles X: Remove BVH and bounces debug passes

Those are only available when Cycles is compiled with special flag, and
do not work on all configurations. For the simplicity removing them.
If we need something like this, better to implement it in a way that is
available for official builds as well.
May 27, 2021, 10:16 (GMT)
Fix missing update enabling active pixels overlay in Cycles-X
May 27, 2021, 10:13 (GMT)
Merge branch 'master' into cycles-x
May 26, 2021, 09:26 (GMT)
Merge branch 'master' into cycles-x
May 21, 2021, 18:04 (GMT)
Cycles X: Experiment with tile reschedule heuristic

The idea is to add new tiles for rendering when the GPU starts to feel
hungry (as opposite of previous logic which was adding new work tiles
once the number of paths goes below certain threshold). Some motivation
behind this decision:

- There is only that many threads the GPU has. Having much more active
threads might avoid some scheduling latency, but there is limit to how
much it helps.

- Scheduling new tiles early on might have negative effect on coherency,
so allowing more paths to be terminated before re-scheduling keeps the
wavefront more coherent and efficient to be calculated.

The new code will use maximum number of threads the GPU has.

```
new old(1) cycles-x(2) megakernel(3)
bmw27.blend 10.2251 10.198 10.7419 10.4269
classroom.blend 15.8454 16.7821 17.2907 16.6609
pabellon.blend 9.34677 9.39898 9.61772 9.14966
monster.blend 10.374 10.5923 10.5886 12.0106
barbershop_interior.blend 11.5124 11.777 11.8522 12.5769
junkshop.blend 15.6783 16.085 16.2821 16.5213
pvt_flat.blend 16.3432 16.5704 16.2637 17.4047

[1] cycles-x branch, previous commit e0716af1a4f
(2) cyclex-x branch hash ad81074fab1
(3) cyclex-x branch hash ef6ce4fa8ca (right before disabling megakernel)
```
May 21, 2021, 13:28 (GMT)
Cycles X: Align kernels of existing and new paths

Only enqueue new kernels when the existing wavefront is at the
intersect closest stage. This seems to positively affect on the
coherency, gaining performance:

```
new cycles-x(1) megakernel(2)
bmw27.blend 10.198 10.6995 10.4269
classroom.blend 16.7821 17.2352 16.6609
pabellon.blend 9.39898 9.65984 9.14966
monster.blend 10.5923 10.5799 12.0106
barbershop_interior.blend 11.777 11.8852 12.5769
junkshop.blend 16.085 16.2971 16.5213
pvt_flat.blend 16.5704 16.3189 17.4047

(1) cyclex-x branch hash ad81074fab1
(2) cyclex-x branch hash ef6ce4fa8ca (right before disabling megakernel)
```

While the pvt_flat (with adaptive sampling) is 1% slower, some
other scenes has performance gained almost all the way back in
comparison to the Cycles-X before megakernel removal.

Note that coherency is a hypothesis. Performance gain might also be
caused by less active paths array calculations.
May 21, 2021, 10:34 (GMT)
Merge branch 'master' into cycles-x
May 19, 2021, 17:44 (GMT)
Cycles X: remove unused megakernel for GPU rendering

This reduces OptiX runtime compilation time to less than a second here.

Differential Revision: https://developer.blender.org/D11313
May 19, 2021, 17:16 (GMT)
Cycles X: Remove usage of mega-kernel

The usage of the mega-kernel is commented out with this change.

There are few benefits of removing the mega-kernel:

- It takes extra time to compile and space to ship.
- It is not compatible with features like shadow catcher.

The rest of the changes are related on attempt to avoid performance
loss in various scenes. Those changes include:

- Make work tile smaller in size. This makes the work tile more
friendly for greedy scheduling when adaptive sampling is used.
Currently this is achieved by keeping pixel same the same and
lowering number of samples per work tile. The idea behind this
is to avoid dramatic change in order in which pixels are
scheduled for sampling.

- Keep tile size dimensions a power of two.
This lowers the unused path states (which can be watched with

./bin//blender --debug-cycles --verbose 3 2>&1 | grep "Number of unused path states"

In own tests it seems that we barely "waste" path states now.

- Make it so tiles are scheduled in the order of samples first.
As in: keep pixel-space coherency, similar to how it is done
in the `get_work_pixel()`.

- Only keep extreme case tests for the tile size calculation.
Avoids some unnecessary updates, while still ensuring correct
behavior in extremes.

The timing goes as following:
```
RTX 6000 (Turing)

new cycles-x
bmw27.blend 10.8964 10.4269
classroom.blend 17.4476 16.6609
pabellon.blend 9.77167 9.14966
monster.blend 10.3662 12.0106
barbershop_interior.blend 11.9445 12.5769
junkshop.blend 16.3556 16.5213
pvt_flat.blend 16.5317 17.4047

RTX A6000 (Ampere)
new cycles-x
bmw27.blend 7.74059 7.65293
classroom.blend 10.775 10.9143
pabellon.blend 6.00643 5.85334
monster.blend 6.79277 8.0134
barbershop_interior.blend 8.39941 8.47159
junkshop.blend 10.4258 10.9882
pvt_flat.blend 10.2752 10.8821
```

Not entirely happy with the results: there are some very nice speedups
interleaved with some slowdown. Although, slowdown is within 5%, so
hopes that we can gain it back with more tricks from the sleeves.

Some thing to try:
- Try lowering tile size in pixels
- Try better alignment of tile size with number of threads on a
multiprocessor.

This change is a combined brain activity from Brecht and myself.

Differential Revision: https://developer.blender.org/D11311
May 19, 2021, 14:09 (GMT)
Cycles X: More flexible tile scheduling support

Allow tiles with sample range smaller than the currently rendering
number of samples.

The previous code was trying to support it already, but it had some
mistakes in the math.

Currently should be no functional changes.
May 19, 2021, 14:02 (GMT)
Cycles X: Increase verbosity level for kernel timings

Allows to more easily access overall kernel time execution without too
much flood in the terminal which could affect overall timing.
May 19, 2021, 10:12 (GMT)
Cycles X: Run init_from_camera kernel for all tiles

Avoids pointer magic which is not necessarily supported by all compute
backends and allows to ensure there are no extra latency caused by
multiple kernel launches.

Currently this does not bring performance improvements, but this change
opens doors for more compute backends and makes it possible to test
different tile slicing and scheduling strategies.

```
init_all_tiles cycles-x
bmw27.blend 10.3444 10.326
classroom.blend 16.476 16.6067
pabellon.blend 9.13914 9.13556
monster.blend 11.9673 11.963
barbershop_interior.blend 12.4566 12.4414
junkshop.blend 16.4764 16.491
pvt_flat.blend 17.288 17.2757
```

Differential Revision: https://developer.blender.org/D11304
May 19, 2021, 08:15 (GMT)
Cleanup: Follow naming convention for private members in Cycles X
May 19, 2021, 08:15 (GMT)
Cleanup: Use const qualifier in Cycles X scene parameters comparator
May 19, 2021, 08:15 (GMT)
Cleanup: Remove unused table offset access in Cycles X
Tehnyt: Miika HämäläinenViimeksi päivitetty: 07.11.2014 14:18MiikaH:n Sivut a.k.a. MiikaHweb | 2003-2021