Revision 612604f by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: BVH-related SSE optimization Several ideas here: - Optimize calculation of near_{x,y,z} in a way that does not require 3 if() statements per update, which avoids negative effect of wrong branch prediction. - Optimization of direction clamping for BVH. - Optimization of point/direction transform. Brings ~1.5% speedup again depending on a scene (unfortunately, this speedup can't be sum across all previous commits because speedup of each of the changes varies from scene to scene, but it still seems to be nice solid speedup of few percent on Linux and bigger speedup was reported on Windows). Once again ,thanks Maxym for inspiration! Still TODO: We have multiple places where we need to calculate near x,y,z indices in BVH, for now it's only done for main BVH traversal. Will try to move this calculation to an utility function and see if that can be easily re-used across all the BVH flavors. |
Revision 8ea5cbd by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Move QBVH near/far offset calculation to an utility function Just preparing for new optimization to be used in all traversal implementation. Should be no measurable difference. |
Revision 98bdc56 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Enable SSE math optimization for AVX kernels This gives about 5% speedup for AVX processors. Benefit of such optimization on other microarchitectures is still under investigation. |
Revision c5f4837 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Implement SSE-optimized path of util_max_axis() The idea here is to avoid if statements which could cause wrong branch prediction. Gives a bit of measurable speedup up to ~1%. Still nice :) Inspired by Maxym Dmytrychenko, thanks! |
Revision cdf556d by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Use new SSE version of offset calculation for all QBVH flavors Gives up to ~1% speedup again. While it seems to be small, still nice since the code now is actually more clean that it used to be before. |
Revision 10e5835 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Add AVX2 path to subsurface triangle intersection Similar to regular triangle intersection case. Gives about 3% speedup rendering SSS object on my desktop, Question: how to avoid such a code duplication in a nice way without speed loss? |
Revision 3a142ec by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Cleanup, style |
Revision 290e37d by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Disable optimization of operator / for float3 This was giving some speedup but made intersection tests to fail from watertight point of view. Needs deeper investigation, but need to quickly get it fixed for the studio. |
Revision 3778475 by Lukas Stockner / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Fix T49630: Cycles: Swapped shader and bake kernels The problem here was, as the title says, that the two kernels were swapped. Since shader evaluation is only used for building the samling map when World MIS is enabled, rendering without it would still work fine, although baking also was broken. |
Revision 7bddb79 by Lukas Stockner / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Fix another OpenCL logging issue Previously an error message would be printed whenever the OpenCL build produced output. However, some frameworks seem to print extra information even if the build succeeded, so now the actual returned error is checked as well. When --debug-cycles is activated, the build output will always be printed, otherwise it only gets printed if there was an error. |
Revision 87ca259 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Use const reference for register variables in non-OpenCL code This is something tested by @LazyDodo and suggested by Maxym to make MSVC happier. |
Revision e2dd5f1 by Hristo Gueorguiev / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: OpenCL 3d textures support. Note that volume rendering is not supported yet, this is a step towards that. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2299 |
Revision edac0e7 by Lukas Stockner / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Improve OpenCL kernel compilation logging The previous refactor changed the code to use a separate logging mechanism to support multithreaded compilation. However, since that's not supported by any frameworks yes, it just resulted in bad logging behaviour. So, this commit changes the logging to go diectly to stdout/stderr once again by default. |
Revision f9b0c10 by Brecht Van Lommel / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Fix build error with WITH_CYCLES_NATIVE_ONLY and recent AVX2 changes. |
Revision 19203b4 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Enable SSE options of math module for AVX2 kernels Currently this does not give measurable difference, but is required ground work for some upcoming further optimization of AVX2 kernels. |
Revision 3d7eb44 by Lukas Stockner / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Split device_opencl.cpp into multiple files for easier maintenance There are no user-visible changes, just some internal restructuring. Differential Revision: https://developer.blender.org/D2231 |
Revision 73bae90 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Use more SSE intrinsics for float3 type This gives about 5% speedup on AVX2 kernels (other kernels still have SSE disabled for math operations) and this solves the slowdown of koro scene mention in the previous commit. The title says it all actually. This commit also contains changes to pass float3 as const reference in affected functions. This should make MSVC happier without breaking OpenCL because it's only done in areas which are ifdef-ed for non-OpenCL. Another patch based on inspiration from Maxym Dmytrychenko, thanks! |
Revision 804d471 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Add new avxf vectorized data type Based on existing ssef data type and to my knowledge it's also what happens in Embree nowadays. Inspired by Maxym Dmytrychenko and required for the upcoming triangle intersection commit. Hopefully the copyright message is correct. |
Revision acd02c7 by Ray molenkamp / Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
[Windows/Cycles/Clang] Fix compilation error with clang-cl on windows |
Revision c01d7c8 by Sergey Sharybin (blender-v2.78b-release, blender-v2.78c-release, fracture_modifier, fracture_modifier-master, temp-fracture-modifier-2.8) January 20, 2017, 10:25 (GMT) |
Cycles: Implement AVX2 version of triangle_intersect This commit basically vectorizes existing code using AVX2 instructions (without modifying algorithm itself). This gives quite nice speedups: BMW: -8% Classroom: -5% Cat: -5% Koro: +1% Barcelona: -8% That's on Linux machine, reported performance improvement on Windows goes up to 20%. Not currently sure why Koro is somewhat slower because it mainly uses curve intersection tests, could be a time noise? Or osmething with the cache utilization perhaps? In any case speedup in other scenes makes me thinking that current state is acceptable for initial implementation. This is again inspired by Maxym Dmytrychenko. |
|