Blender Git Loki

Blender Git "fracture_modifier-master" branch commits.

Page: 43 / 129

January 20, 2017, 10:25 (GMT)
Cycles: BVH-related SSE optimization

Several ideas here:

- Optimize calculation of near_{x,y,z} in a way that does not require
3 if() statements per update, which avoids negative effect of wrong
branch prediction.

- Optimization of direction clamping for BVH.

- Optimization of point/direction transform.

Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).

Once again ,thanks Maxym for inspiration!

Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
January 20, 2017, 10:25 (GMT)
Cycles: Move QBVH near/far offset calculation to an utility function

Just preparing for new optimization to be used in all traversal implementation.

Should be no measurable difference.
January 20, 2017, 10:25 (GMT)
Cycles: Enable SSE math optimization for AVX kernels

This gives about 5% speedup for AVX processors.

Benefit of such optimization on other microarchitectures is still
under investigation.
January 20, 2017, 10:25 (GMT)
Cycles: Implement SSE-optimized path of util_max_axis()

The idea here is to avoid if statements which could cause wrong
branch prediction.

Gives a bit of measurable speedup up to ~1%. Still nice :)

Inspired by Maxym Dmytrychenko, thanks!
January 20, 2017, 10:25 (GMT)
Cycles: Use new SSE version of offset calculation for all QBVH flavors

Gives up to ~1% speedup again.

While it seems to be small, still nice since the code now is actually more
clean that it used to be before.
January 20, 2017, 10:25 (GMT)
Cycles: Add AVX2 path to subsurface triangle intersection

Similar to regular triangle intersection case. Gives about 3% speedup rendering
SSS object on my desktop,

Question: how to avoid such a code duplication in a nice way without speed loss?
January 20, 2017, 10:25 (GMT)
Cycles: Cleanup, style
January 20, 2017, 10:25 (GMT)
Cycles: Disable optimization of operator / for float3

This was giving some speedup but made intersection tests to fail
from watertight point of view.

Needs deeper investigation, but need to quickly get it fixed for
the studio.
January 20, 2017, 10:25 (GMT)
Fix T49630: Cycles: Swapped shader and bake kernels

The problem here was, as the title says, that the two kernels were swapped.
Since shader evaluation is only used for building the samling map when World MIS is enabled, rendering without it would still work fine, although baking also was broken.
January 20, 2017, 10:25 (GMT)
Cycles: Fix another OpenCL logging issue

Previously an error message would be printed whenever the OpenCL build produced output.
However, some frameworks seem to print extra information even if the build succeeded, so now the actual returned error is checked as well.
When --debug-cycles is activated, the build output will always be printed, otherwise it only gets printed if there was an error.
January 20, 2017, 10:25 (GMT)
Cycles: Use const reference for register variables in non-OpenCL code

This is something tested by @LazyDodo and suggested by Maxym to make
MSVC happier.
January 20, 2017, 10:25 (GMT)
Cycles: OpenCL 3d textures support.

Note that volume rendering is not supported yet, this is a step towards that.

Reviewed By: brecht

Differential Revision: https://developer.blender.org/D2299
January 20, 2017, 10:25 (GMT)
Cycles: Improve OpenCL kernel compilation logging

The previous refactor changed the code to use a separate logging mechanism to support multithreaded compilation.
However, since that's not supported by any frameworks yes, it just resulted in bad logging behaviour.
So, this commit changes the logging to go diectly to stdout/stderr once again by default.
January 20, 2017, 10:25 (GMT)
Fix build error with WITH_CYCLES_NATIVE_ONLY and recent AVX2 changes.
January 20, 2017, 10:25 (GMT)
Cycles: Enable SSE options of math module for AVX2 kernels

Currently this does not give measurable difference, but is required
ground work for some upcoming further optimization of AVX2 kernels.
January 20, 2017, 10:25 (GMT)
Cycles: Split device_opencl.cpp into multiple files for easier maintenance

There are no user-visible changes, just some internal restructuring.

Differential Revision: https://developer.blender.org/D2231
January 20, 2017, 10:25 (GMT)
Cycles: Use more SSE intrinsics for float3 type

This gives about 5% speedup on AVX2 kernels (other kernels still
have SSE disabled for math operations) and this solves the slowdown
of koro scene mention in the previous commit.

The title says it all actually. This commit also contains
changes to pass float3 as const reference in affected functions.

This should make MSVC happier without breaking OpenCL because it's
only done in areas which are ifdef-ed for non-OpenCL.

Another patch based on inspiration from Maxym Dmytrychenko, thanks!
January 20, 2017, 10:25 (GMT)
Cycles: Add new avxf vectorized data type

Based on existing ssef data type and to my knowledge it's also what happens in
Embree nowadays.

Inspired by Maxym Dmytrychenko and required for the upcoming triangle
intersection commit.

Hopefully the copyright message is correct.
January 20, 2017, 10:25 (GMT)
[Windows/Cycles/Clang] Fix compilation error with clang-cl on windows
January 20, 2017, 10:25 (GMT)
Cycles: Implement AVX2 version of triangle_intersect

This commit basically vectorizes existing code using AVX2 instructions
(without modifying algorithm itself). This gives quite nice speedups:

BMW: -8%
Classroom: -5%
Cat: -5%
Koro: +1%
Barcelona: -8%

That's on Linux machine, reported performance improvement on Windows
goes up to 20%.

Not currently sure why Koro is somewhat slower because it mainly uses
curve intersection tests, could be a time noise? Or osmething with the
cache utilization perhaps? In any case speedup in other scenes makes
me thinking that current state is acceptable for initial implementation.

This is again inspired by Maxym Dmytrychenko.
Tehnyt: Miika HämäläinenViimeksi päivitetty: 07.11.2014 14:18MiikaH:n Sivut a.k.a. MiikaHweb | 2003-2021