Blender Git Loki

Blender Git "cycles-x" branch commits.

Page: 13 / 44

July 16, 2021, 12:37 (GMT)
Cycles X: Support shadow catcher behind transparent object

Seems to be rather straightforward.
Think initially it was less obvious to do, until we've started to count
number of samples for the shadow catcher pass.

Note that this is only about support of Transparent BSDF. Glass BSDF
can not be supported as it refracts light, which is not possible to
store as a shadow catcher pass.

Differential Revision: https://developer.blender.org/D11946
July 16, 2021, 12:32 (GMT)
Cycles X: Support Transparent Glass for shadow catcher

Improves support of Glass BSDF in front of a shadow catcher.
July 16, 2021, 11:44 (GMT)
Fix fully transparent shadow catcher pass without catchers

Makes it so behavior of a shadow catcher pass is always predictable:
it is always possible to multiply it with a backdrop, regardless of
presence of shadow catcher object in the scene.

The downside it that this change makes it so extra memory is allocated
to store empty shadow catcher, and will make it so denoiser will an
extra work. Is possible to avoid, but it ends up in tricky checks, and
the situation is unlikely to be that common to justify making code
more complex.

Differential Revision: https://developer.blender.org/D11945
July 15, 2021, 16:22 (GMT)
Cycles X: Ignore shadow catcher from holdout collection

Differential Revision: https://developer.blender.org/D11937
July 15, 2021, 16:19 (GMT)
Fix wrong render result after cryptomatte commit

Was checking the wrong field to see whether there are any cryptomatte
passes in the scene.
July 15, 2021, 15:15 (GMT)
Cycles X: Bring back cryptomatte post-processing

Is the non-accurate mode which is used for both CPU and GPU which is
done as a post-processing pass after all samples have finished. This
is happening via render scheduler, as it knows when path tracing did
finish.

Compared to regular Cycles this makes it so the cprypromatte pass is
properly sorted with adaptive sampling enabled.

The accurate CPU implementation which used to be done via the Coverage
class is not yet hooked back. This needs to somehow happen either via
the kernel or via the PathTraceWork. Current state of the patch should
make it trivial to bring accurate implementation back.

This change also fixes missing denoising when rendering when using
constant time rendering.

Differential Revision: https://developer.blender.org/D11934
July 15, 2021, 15:00 (GMT)
Cycles X: Implement path compaction for shadow catcher

The demo file is BMW27 with the ground set as a shadow catcher.
The observed performance improvement is about 5% on RTX5000.

The general idea is to schedule new tiles in a way that we always
leave space for the shadow catcher. Roughly, we first schedule 50%
of path states from the maximum number of paths, then 25% and so on.

Summary of changes:

- Replace constant offset of shadow catcher state with an atomically
incrementing index.

- Add new kernel to count number of states which can still spit.

Could experiment with some atomics so that path split decreases a
value, so does path termination, and increase it when new paths
are added. Not sure this will give better performance.

- Remove terminated paths kernel from scheduling.
The paths are compacted, so we know they are in the beginning of
the array.

Differential Revision: https://developer.blender.org/D11932
July 15, 2021, 14:59 (GMT)
Cycles X: Tweak max number of states seen by tile scheduler

This is required for shadow catchers to make it so the tile scheduler
gives work which can fir into the number of allowed camera rays. Use
a smaller value from the maximum number of states to prepare code for
state compaction of re-scheduling for the shadow catcher.

Interestingly, this has positive effect on regular rendering here with
RTX 5000:
```
new cycles-x
bmw27.blend 12.445 12.2104
classroom.blend 24.4949 24.4508
pabellon.blend 11.3019 11.4407
monster.blend 13.409 13.4491
barbershop_interior.blend 18.6601 18.8364
junkshop.blend 26.3212 27.051
pvt_flat.blend 22.7389 22.9345
```

For the future development we might try to make it so tile scheduler
gives smaller tiles with smaller number of samples, rely on the path
work GPU to request as many tiles as fit into the path states. Need
to be careful though, because there are downsides in terms of memory
bandwidth to pass works tiles to the init_from kernels.
July 15, 2021, 14:59 (GMT)
Fix Cycles X adaptive sampling convergence check

The optimization of atomics and reduction was wrong: the warp voting
functions operate on a threads from a warp (obviously), and the result
of the vote is to be accumulated once for every warp.

Thread index is measured within a block, not within a warp: a block
can have a lot (GPU-dependent) number of threads, while warp has only
32 threads.

Now the code does a voting and atomically adds to the result.

This solves possible too-early sampling stop on GPU, but because the
old code could have finished too soon, there is potential that the
absolute render time number goes up.

Is one of the things which is a bit hard to see on the real file,
but the same approach was giving wrong approach during development of
shadow catcher occupancy improvement. So best visualization of the
problem so far was to force `converged` to be always false and print
number of pixels and active pixels after the running kernel. Before
this change the number of active pixels was much smaller than the
number of pixels, now those values match.
July 15, 2021, 11:55 (GMT)
Cycles X: restore estimation of kernel memory usage for host memory fallback

This makes it so that we don't allocate scene memory on the device, only to
then find out later it has to move back to the host.

Integrator working memory is now allocated before loading the kernels and
allocating scene memory. This way it is included in the estimated kernel
memory usage, and makes it less likely to be moved to the host.

Differential Revision: https://developer.blender.org/D11922
July 15, 2021, 09:28 (GMT)
Cycles X: Tweaks to the multi-device balancing

There few ideas with this change:

- Base on equalizing actual time devices are spent rendering, rather
than trying to estimate this via performance-per-unit-work.

This gives better estimate and covergence than the old calculation
on the pabellon.blend.

- Perform first re-balancing based on accumulated statistics after a
short period of time rather than after first sample. This allows to
accumulate a more accurate statistics.

- Perform re-balancing more often even in the headless render when
the balance is not ideal yet.

There are some other changes, like perform rebalancing before path
tracing. This way it seems to be easier to write logic in the
scheduler.

Headless render on RTX 5000 GPU and i9-11900k CPU:
```
new cycles-x
bmw27.blend 14.8814 20.0281
classroom.blend 30.025 26.9318
pabellon.blend 13.1679 12.6133
monster.blend 16.4408 16.3826
barbershop_interior.blend 22.83 19.9255
junkshop.blend 28.7097 27.2703
pvt_flat.blend 24.7341 21.8464
```

F12 render on the same configuration:
```
new cycles-x
bmw27.blend 13.5106 13.9074
classroom.blend 31.3891 31.7155
pabellon.blend 12.3674 49.053
monster.blend 14.4754 13.6263
barbershop_interior.blend 24.8804 23.999
junkshop.blend 29.1324 27.267
pvt_flat.blend 25.6206 22.6731
```

While this helps a lot for the pabellon file, other files seems to
experience a slowdown. It is a bit hard to find a good balance
between how often to perform device load rebalancing and how occupied
to keep the devices.

There is also some measurable deviation in the render times, depending
on previous load and such. For example the pvt_flat.blend deviates
between ~23 and ~27 seconds. Probably something to do with thermal
profile and the fact that we allow to balance quickly and then schedule
a big chunk of work to render.

Not totally satisfied, but seems that overall this is a better
heuristic.

Differential Revision: https://developer.blender.org/D11897
July 15, 2021, 08:05 (GMT)
Cleanup: Cycles X compilation warnings
July 14, 2021, 16:47 (GMT)
Fix error loading non-existent shadow kernel pass after recent changes
July 14, 2021, 15:29 (GMT)
Cycles X: make OptiX 7.3 the minimum required SDK version

This ensure the new faster builtin curve intersection is used, and lets us
simplify the code a bit.

Differential Revision: https://developer.blender.org/D11866
July 14, 2021, 15:24 (GMT)
Cycles X: restore shadow pass

Differential Revision: https://developer.blender.org/D11896
July 14, 2021, 15:23 (GMT)
Cycles X: reduce GPU state memory usage when some features are not enabled

In particular: volumes, subsurface, denoising and light passes.

In a scene without these features, we go from 538MB to 346MB for the state
memory usage. This also improves performance, presumably due to reduced
memory traffic.

Differential Revision: https://developer.blender.org/D11915
July 14, 2021, 15:23 (GMT)
Cycles X: change requested device features to bitflags

So that they can be shared between host and device.

Differential Revision: https://developer.blender.org/D11914
July 14, 2021, 15:23 (GMT)
Cycles X: use less memory for float3 integrator state on GPU

Allocate different device_only_memory size depending if the device is CPU
or GPU, since for GPU we don't align to 16 bytes for SSE.

Also adds some sanity checks and ensure float3 is not used in device_vector
since it's incompatible for sharing data between CPU and GPU.

Differential Revision: https://developer.blender.org/D11913
July 14, 2021, 15:23 (GMT)
Cleanup: remove disabled OpenCL implementation

To be replaced with something else for non-NVIDIA devices later. Makes it
easier to do some of the upcoming changes.

Differential Revision: https://developer.blender.org/D11912
July 14, 2021, 14:28 (GMT)
Merge branch 'master' into cycles-x
Tehnyt: Miika HämäläinenViimeksi päivitetty: 07.11.2014 14:18MiikaH:n Sivut a.k.a. MiikaHweb | 2003-2021