Blender Git Loki
Git Commits -> Revision d130c66
Revision d130c66 by Bastien Montagne (master) December 22, 2017, 21:12 (GMT) |
Fix scalability issue in threaded code of Mesh normals computation. We tried to do as much as possible in a single threaded callback, which lead to using some nasty tricks like fake atomic-based spinlocks to perform some operations (like float addition, which has no atomic intrinsics). While OK with 'standard' low number of working threads (8-16), because collision were rather rare and implied memory barrier not *that* much overhead, this performed poorly with more powerful systems reaching the 100 of threads and beyond (like workstations or render farm hardware). There, both memory barrier overhead and more frequent collisions would have significant impact on performances. This was addressed by splitting further the process, we now have three loops, one over polys, loops and vertices, and we added an intermediate storage for weighted loop normals. This allows to avoid completely any atomic operation in body of threaded loops, which should fix scalability issues. This costs us slightly higher temp memory usage (something like 50Mb per million of polygons on average), but looks like acceptable tradeoff. Further more, tests showed that we could gain an additional ~7% of speed in computing normals of heavy meshes, by also parallelizing the last two loops (might be 1 or 2% on overall mesh update at best...). Note that further tweaking in this code should be possible once Sergey adds the 'minimum batch size' option to threaded foreach API, since very light loops like the one on loops (mere v3 addition) require much bigger batches than heavier code (like the one on polys) to keep optimal performances. |
Commit Details:
Full Hash: d130c66db436b1fccbbde040839bc4cb5ddaacd2
Parent Commit: 6efd58d
Lines Changed: +48, -26
1 Modified Path:
/source/blender/blenkernel/intern/mesh_evaluate.c (+48, -26) (Diff)