Monday, September 21, 2009

Fried meshes

* [20090930] Values for HD4870 and Direct3D11 are updated (it was a debug runtime:(

Huge crowd rendering can be accelerated by baking skinned meshes into static meshes in runtime. But there is no easy way to do it because of API and hardware limitations.

The scene consist of 49 characters and each character is rendered 6 times. Tables contain millions of polygons per second. There are no tests for low-level cards. But on such cards performance gain is very substantial. And there are no tests for OpenCL because of incomplete drivers.

The first two rows describe direct rendering of skinned meshes:
"Raw" is a single mesh per draw call.
"Inst" is multiple meshes per draw call.

The other rows describe different backing techniques:
"Raw" corresponds to backing single character per call between characters rendering.
"Inst" corresponds to backing 32 characters per call before characters rendering.

PBO: works well on all cards but instancing is required.
FeedBack: requires DX10 level card and doesn't work on ATI cards now.
CUDA: requires NVIDIA DX10 level card. Also instancing is very important.

R2VB: is ok on ATI and NV40. But we can't use this technique with NVIDIA DX10 cards.
CUDA: same as CUDA under OpenGL.

StreamOut: there is absolutely no problem with this technique.
CUDA: same as CUDA under OpenGL.

StreamOut: same as StreamOut under Direct3D10.
DirectCompute: instancing is also very important on NVIDIA hardware.


  1. Regarding performance OpenGL and Direct3D11 on GTX260: can you tell us why OpenGL is so behind Direct3D11? Do you know if and when OpenGL will reach the same performance?

  2. Performance of OpenGL driver is not excellent. It's 20-30% performance drop in comparison to D3D9/D3D10. We are working on this situation with hardware vendors.