Blender Git Commits
June 8, 2017, 09:35 (GMT) |
Cycles: Adjust split kernel tile updating logic to make rendering a bit faster This makes tiles update less frequently and causes there to be more samples in each batch making rendering faster. This helps a bit with the slowdown seen from D2703. I don't really like tiles not updating as much, it feels much less responsive, maybe theres another way to go about it? Timings by nirved: https://hastebin.com/ifanihewum.css |
June 8, 2017, 09:35 (GMT) |
Cycles: Pass all buffers to each kernel call for OpenCL Technically not passing all buffers used by a kernel is undefined behavior. We haven't had any issues with this so far on AMD or Nvidia, but it's known to be a problem with Intel and we received a report from AMD that this is a problem on newer hardware, so we need to make this change at some point. Unfortunately there a cost to being correct, about 5% for the benchmark scenes. For low sample counts it's even worse, I've seen up to 50% slowdown. For the latter case I think adjusting tile updating logic can help, but not sure what that would look like yet (it would be just a few lines change however). |
June 8, 2017, 09:19 (GMT) |
Cycles: Faster split branched path tracing by sharing samples with inactive threads Unlike regular path tracing, branched path tracing is usually used with lower sample counts, at least for primary rays. This means that are less samples for the GPU to work on in parallel and rendering is slower. As there is less work overall there is also more inactive threads during rendering with BPT. This patch makes use of those inactive rays to render branched samples in parallel with other samples. Each thread that is preparing for a branched sample will attempt to find an inactive thread and if one is found the state for the sample is copied to that thread. Potentially, if there are enough inactive threads, 100s of branched samples could be generated from the same originating thread and ran in parallel giving large speed ups. Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW with car paint replaced with SSS/volumes. |
June 8, 2017, 09:19 (GMT) |
Cycles: Add function to accumulate samples with atomics for split kernel Samples ran in parallel need a safe way to accumulate their results with the results of other threads. |
June 8, 2017, 09:19 (GMT) |
Cycles: Add function to dequeue a ray |
June 8, 2017, 09:19 (GMT) |
Cycles: Add atomic decrement functions to util_atomic.h |
June 8, 2017, 09:19 (GMT) |
Cycles: Add kernel to enqueue inactive rays The queue will be used to make reuse of inactive threads to keep the GPU more busy. |
June 8, 2017, 09:19 (GMT) |
Cycles: Blacklist unsupported OpenCL devices Due to various driver issues with AMD GCN 1 cards we can no longer support these GPUs. This patch makes them unavailable to select for Cycles rendering. GCN cards 2 and higher are still supported. Please use the most recent drivers available to ensure proper functionality. See here for a list to check which GPUs are supported: https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units |