Blender Git Loki

Blender Git "cycles-x" branch commits.

Page: 15 / 44

July 9, 2021, 09:13 (GMT)
Cycles X: Reduce memory usage of OIDN denoiser

The idea is to perform denoising in the render buffer's
denoised pass without allocating temporary color.

This optimization covers all scenarios of when OIDN is used:
single- and multi- device rendering.

This also fixes certain artifacts caused by previous memory
optimizations which lead to a feedback loops.

Differential Revision: https://developer.blender.org/D11852
July 9, 2021, 09:05 (GMT)
Fix access shadow catcher pass without catcher objects in Cycles X

This fixes the following setup:
- Have scene without shadow catcher objects
- Enabled Shadow Catcher pass
- F12

Differential Revision: https://developer.blender.org/D11850
July 9, 2021, 08:58 (GMT)
Cycles X: Cache result of scene's has shadow catcher test

The scene is tagged for an update of the flag from object's update tag.
Currently not following the typical nodes update because of couple of
reasons:

- Scene is not a node, and object manager does not store any state at
this time.

- The access to the flag could happen before scene device update.
July 9, 2021, 08:49 (GMT)
Cycles X: Reduce memory usage of OptiX denoiser

The idea is to perform denoising in the render buffer's denoised pass
without allocating temporary color.

This optimization covers all scenarios of when OptiX denoiser is used:
single- and multi- device rendering.

The change is a bit bigger than it could be from the minimal functional
point of view, but kept running into some issues and seemed helpful to
make certain parts more clear.

Differential Revision: https://developer.blender.org/D11847
July 9, 2021, 08:45 (GMT)
Merge branch 'master' into cycles-x
July 9, 2021, 08:22 (GMT)
Merge branch 'master' into cycles-x
July 8, 2021, 15:46 (GMT)
Fix adaptive sampling with denoising in Cycles X

Writing to the combined pass can not assume that the combined pass has
offset of 0: it is possible that pass at 0 offset is a denoised pass.

Use explicit offset, similar to other places where accumulation to the
combined pass happens.
July 8, 2021, 11:20 (GMT)
Fix shadow catcher denoiser after recent changes in Cycles X

The shadow catcher pass access requires combined pass.
This means that the combined pass can not be modified in-place before
the shadow catcher pass is calculated.
July 8, 2021, 10:41 (GMT)
Fix crash accessing SHadow Catcher pass without catchers in Cycles X
July 7, 2021, 08:55 (GMT)
Cycles X: Reduce OIDN memory usage for shadow catcher and multi-device

Read compositing passes in-place, avoiding extra memory allocation for
the OIDN pass.

Differential Revision: https://developer.blender.org/D11840
July 6, 2021, 15:38 (GMT)
Cycles X: Reduce OIDN memory usage with multi-device render

Allow OIDN to modify render buffers in-place, without allocating
extra temporary buffers.

Currently memory is only saved for non-composited passes (combined,
shadow catcher matte). The composited passes are possible to avoid
memory allocation as well, but requires passing row stride to the
pass accessor which is not yet possible.

Differential Revision: https://developer.blender.org/D11826
July 6, 2021, 15:31 (GMT)
Fix alpha in denoised shadow catcher pass in Cycles X

Seems it was wrong since the initial implementation.

The issue is that for the shadow catcher pass we can not copy alpha
from input pass as the pass is calculated based on other passes.

Decided to go with an implicit knowledge that composited passes are
always opaque. Saves some complications and memory by storing full
RGBA buffer for noisy composited passes.

Differential Revision: https://developer.blender.org/D11825
July 6, 2021, 13:51 (GMT)
Cycles X: Shading performance improvements by changing inlining behavior for SVM

The shading kernels (shade_surface, ...) are limited by memory a lot. I found several hotspots
where execution was stalled waiting for spills to be loaded back into registers. That's
something that can be adjusted by changing the inlining logic:

For example, the compiler did not inline "kernel_write_denoising_features" (even though it
was marked __inline__), which caused it to force synchronization before the function call.
Forcing it inline avoided that and got rid of that hotspot.

Then there was cubic texture filtering and NanoVDB, which introduced huge code chunks
into each texture sampling evaluation (increasing register and instruction cache pressure),
even though they are rarely actually used. Making them __noinline__ outsources that
overhead to only occur when actually used.

Another case is the SVM. The compiler currently converts the node type switch statement
into a binary searched branch sequence. This means depending on the SVM node hit, the
GPU has to branch over large portions of code, which increases instruction cache pressure
immensely (GPU is fetching lots of code even for stuff it immediately jumps away from
again, while jumping through the binary searched branches). This can be reduced somewhat
by making all the node functions __noinline__, so that the GPU only has to branch over a
bunch of call instructions, rather than all the inlined code.
The SVM "offset" value is passed by value into the node functions now and returned through
function return value, to make the compiler keep it in a register. Otherwise when passed as
a pointer, in OptiX the compiler was forced to move it into local memory (since functions
are compiled separately there, so the compiler is unaware of how that pointer is used).

Differential Revision: https://developer.blender.org/D11816
July 6, 2021, 11:38 (GMT)
Fix crash rendering some scenes after master merge

Now that lamps are handled are primitives in intersections, we must include
them in the bitmask used for packing primitive type bits.
July 6, 2021, 10:38 (GMT)
Cycles X: Make pass definition more robust to changes

Previously adding, removing, or even changing order of passes in
the kernel_types.h would likely to break display pass enum.

This was because the python enum was relying on an exact match of
enum item values.

Now we do an identifier-based lookup via `Pass::get_type_enum`,
which allows to more safely change passes in kernel without risk
of breaking display passes.

Additionally, conversion of pass to string now also happens via
the `Pass::get_type_enum`.

All in all, it is the pass type enum which s the source of truth
with this change.

Differential Revision: https://developer.blender.org/D11823
July 6, 2021, 10:26 (GMT)
Cycles X: Allow viewing denoising passes in viewport

Can be used without denoiser configured (acting as if the denoising
data passes are enabled in the view layer options).

Differential Revision: https://developer.blender.org/D11821
July 6, 2021, 10:25 (GMT)
Cycles X: Only copy denoised passes for multi-device render

No functional changes, and timing of the denoising process should be
quite the same. The change opens the doors to allow denoisers to modify
data in-place, avoiding extra allocation in the denoisers, lowering
memory peak of the denoising process.

Differential Revision: https://developer.blender.org/D11815
July 6, 2021, 09:36 (GMT)
Merge branch 'master' into cycles-x
July 6, 2021, 09:13 (GMT)
Cycles X: Reduce memory usage when denoising in multi-device render

The idea is to create a full big tile buffer on the actual device
which will be used for denoising. This avoids OptiX creating a yet
another copy of the render buffers on the actual device.

Mainly moving some lines around from DeviceDenoiser to Denoiser
to make logic more accessible by all denoisers, and in the path
tracer.

Assume allocation is cheaper than data transfer, so that some TODOs
are marked as done.

It's possible to reduce memory even further by allowing OIDN and
OptiX to modify the copy of the render buffers in-place, as it can
be thrown away. Considering this an independent further improvement
which is not tackled in this change.

Differential Revision: https://developer.blender.org/D11814
July 5, 2021, 15:23 (GMT)
Cycles X: Ensure buffers zero/copy happens in a desired order

Use GPU queue to perform buffers copy form/to and zero operations,
so that things happens in proper order with the `render_samples()`.

Seems to solve artifacts when using OptiX denoiser in viewport and
dual GPU rendering.

Differential Revision: https://developer.blender.org/D11809
Tehnyt: Miika HämäläinenViimeksi päivitetty: 07.11.2014 14:18MiikaH:n Sivut a.k.a. MiikaHweb | 2003-2021