|
Saturday, May 29, 2010
Intel SSE performance issue
Particle system render buffer generation has been deeply refactored to obtain better performance. There was a strange performance issue during this process... You can see two versions of the same code bellow. There is no performance difference on AMD CPU between the first and the second rendering code fragments. But on Intel Core i5 the difference is huge. The first version generates only 10M particles per second, while the second one shows 60M particles per second!
Subscribe to:
Post Comments (Atom)
What compiler was used?
ReplyDeleteMaybe this has something to do with L1 cache? Would be interesting to show the instruction disassembly of the two versions to see how compiler is doing the interchanged access to v[n] and test other cases within i5.
Compiler is "Visual C++ 2008 Express Edition". Seems like L1 cache miss occurs... I will try to post disassembly later.
ReplyDeleteWeird, traditionally Intel has more L1 cache than AMD. What was the performance on AMD? 60M part/s too?
ReplyDeleteA good try is to use GCC and see how it compiles both codes. Have you tried to use valgrind+cachegrind? I always use these tools to keep my code friendly to cache and branch prediction.
What status on modern PC?
ReplyDelete