Blender Git Loki
Git Commits -> Revision 8b4acad
Revision 8b4acad by Sergey Sharybin (cycles-x) June 29, 2021, 09:55 (GMT) |
Cycles X: Initial support of multi-GPU and GPU+CPU rendering This change makes it possible to render single frame on multiple GPUs and/or GPU(s)+CPU. (as configured in the User Preferences). Work is split equally along the height of the big tile. In the future this will be looked into to perform better initial guess based on devices performance, dynamic re-scheduling, and interleaving scanlines across devices. The main idea is to move render buffers to per-work basis, so that the ender buffers are always associated with the device work is being done by. And then upon access delegate the read/write to the work, so that it operates with a specific slice in the source/destination, There are some multiple memory and performance improvements possible, like: - Copy render result to GPUDisplay from multiple threads (now when it is clear graphics inetrop can not be mixed in with naive update). - Avoid denoiser buffer re-allocation. - Avoid creation of temporary buffers in the denoisers when we know that we have a copy of real buffers. - Only copy passes needed for denoiser, and results of denoiser. The current state of the `PathTrace::denoise()` is not entirely ideal: it could be split up, and memory usage could be improved. But think it is good enough for the initial implementation. The further improvements would require changes in the Denoiser API. Differential Revision: https://developer.blender.org/D11727 |
Commit Details:
Full Hash: 8b4acade6c10d2978584e5819aace106046435f8
Parent Commit: 01363b5
Lines Changed: +499, -161
18 Modified Paths:
/intern/cycles/blender/blender_gpu_display.cpp (+17, -4) (Diff)
/intern/cycles/blender/blender_gpu_display.h (+5, -1) (Diff)
/intern/cycles/integrator/denoiser_device.cpp (+14, -1) (Diff)
/intern/cycles/integrator/pass_accessor.cpp (+1, -1) (Diff)
/intern/cycles/integrator/pass_accessor.h (+10, -0) (Diff)
/intern/cycles/integrator/pass_accessor_cpu.cpp (+4, -3) (Diff)
/intern/cycles/integrator/pass_accessor_gpu.cpp (+4, -2) (Diff)
/intern/cycles/integrator/path_trace.cpp (+201, -71) (Diff)
/intern/cycles/integrator/path_trace.h (+25, -16) (Diff)
/intern/cycles/integrator/path_trace_work.cpp (+87, -6) (Diff)
/intern/cycles/integrator/path_trace_work.h (+42, -14) (Diff)
/intern/cycles/integrator/path_trace_work_cpu.cpp (+8, -4) (Diff)
/intern/cycles/integrator/path_trace_work_cpu.h (+1, -4) (Diff)
/intern/cycles/integrator/path_trace_work_gpu.cpp (+41, -12) (Diff)
/intern/cycles/integrator/path_trace_work_gpu.h (+4, -4) (Diff)
/intern/cycles/kernel/device/cuda/kernel.cu (+16, -8) (Diff)
/intern/cycles/render/gpu_display.cpp (+3, -2) (Diff)
/intern/cycles/render/gpu_display.h (+16, -8) (Diff)
/intern/cycles/blender/blender_gpu_display.h (+5, -1) (Diff)
/intern/cycles/integrator/denoiser_device.cpp (+14, -1) (Diff)
/intern/cycles/integrator/pass_accessor.cpp (+1, -1) (Diff)
/intern/cycles/integrator/pass_accessor.h (+10, -0) (Diff)
/intern/cycles/integrator/pass_accessor_cpu.cpp (+4, -3) (Diff)
/intern/cycles/integrator/pass_accessor_gpu.cpp (+4, -2) (Diff)
/intern/cycles/integrator/path_trace.cpp (+201, -71) (Diff)
/intern/cycles/integrator/path_trace.h (+25, -16) (Diff)
/intern/cycles/integrator/path_trace_work.cpp (+87, -6) (Diff)
/intern/cycles/integrator/path_trace_work.h (+42, -14) (Diff)
/intern/cycles/integrator/path_trace_work_cpu.cpp (+8, -4) (Diff)
/intern/cycles/integrator/path_trace_work_cpu.h (+1, -4) (Diff)
/intern/cycles/integrator/path_trace_work_gpu.cpp (+41, -12) (Diff)
/intern/cycles/integrator/path_trace_work_gpu.h (+4, -4) (Diff)
/intern/cycles/kernel/device/cuda/kernel.cu (+16, -8) (Diff)
/intern/cycles/render/gpu_display.cpp (+3, -2) (Diff)
/intern/cycles/render/gpu_display.h (+16, -8) (Diff)