We use TeamCity as a continuous integration server to build different configurations of Unigine engine, tools, SDKs and so on. We also use Trac as an issue tracker. The problem was lack of their native integration, but it's solved: Max wrote a TeamcityIntegration plugin, feel free to use it.
BTW, it takes about 5 minutes to make full rebuild of Unigine (engine+tools, both release and debug) on a Linux build node, while the Windows one (which has more powerful hardware) spends 12+ minutes for the same build type. So ccache rules, we do miss it for Visual Studio.
Tuesday, November 23, 2010
Thursday, October 21, 2010
Thursday, September 30, 2010
Friday, September 17, 2010
CUDA vs OpenCL vs SPU Part IV
Finally I've got radix sort implementation which is working on AMD OpenCL. This is a link on previous sorting algorithms test. And now we have new more interesting results :) Time of GPU sorting includes time of data downloading from video memory. Sorted structure is single uint2 array for bitonic sort, and two int arrays for radix sort.
Elements 64768:
CUDA:
GeForce 9600GT:
CUBitonic Time: 0.007 FPS: 140.5679 Mem: 69.46 Mb/s
CURadix Time: 0.003 FPS: 342.8180 Mem: 169.40 Mb/s
GeForce GTX 260:
CUBitonic Time: 0.002 FPS: 403.5513 Mem: 199.41 Mb/s
CURadix Time: 0.001 FPS: 752.4454 Mem: 371.81 Mb/s
GeForce GTX 480:
CUBitonic Time: 0.001 FPS: 966.1836 Mem: 477.43 Mb/s
CURadix Time: 0.001 FPS: 1398.6014 Mem: 691.11 Mb/s
OpenCL:
GeForce 9600GT:
CLBitonic Time: 0.008 FPS: 126.9519 Mem: 62.73 Mb/s
CLRadix Time: 0.004 FPS: 252.9724 Mem: 125.00 Mb/s
GeForce GTX 260:
CLBitonic Time: 0.004 FPS: 266.7378 Mem: 131.81 Mb/s
CLRadix Time: 0.003 FPS: 368.3241 Mem: 182.00 Mb/s
GeForce GTX 480:
CLBitonic Time: 0.002 FPS: 579.3743 Mem: 286.29 Mb/s
CLRadix Time: 0.002 FPS: 614.2506 Mem: 303.53 Mb/s
Radeon HD5870:
CLRadix Time: 0.003 FPS: 372.9952 Mem: 184.31 Mb/s
Single SPU:
quickSort SPU Time: 0.031 FPS: 32.5298 Mem: 32.15 Mb/s
radixSort SPU Time: 0.004 FPS: 226.9632 Mem: 224.30 Mb/s
Elements 1036288:
CUDA:
GeForce 9600GT:
CURadix Time: 0.036 FPS: 27.8808 Mem: 220.43 Mb/s
GeForce GTX 260:
CURadix Time: 0.013 FPS: 74.1400 Mem: 586.17 Mb/s
GeForce GTX 480:
CURadix Time: 0.006 FPS: 161.9958 Mem: 1280.78 Mb/s
OpenCL:
GeForce 9600GT:
CLRadix Time: 0.034 FPS: 29.8312 Mem: 235.85 Mb/s
GeForce GTX 260:
CLRadix Time: 0.013 FPS: 79.2959 Mem: 626.93 Mb/s
GeForce GTX 480:
CLRadix Time: 0.007 FPS: 136.2955 Mem: 1077.59 Mb/s
Radeon HD5870:
CLRadix Time: 0.020 FPS: 49.9800 Mem: 395.15 Mb/s
Single SPU:
quickSort SPU Time: 0.695 FPS: 1.4381 Mem: 22.74 Mb/s
radixSort SPU Time: 0.070 FPS: 14.3275 Mem: 226.55 Mb/s
Elements 64768:
CUDA:
GeForce 9600GT:
CUBitonic Time: 0.007 FPS: 140.5679 Mem: 69.46 Mb/s
CURadix Time: 0.003 FPS: 342.8180 Mem: 169.40 Mb/s
GeForce GTX 260:
CUBitonic Time: 0.002 FPS: 403.5513 Mem: 199.41 Mb/s
CURadix Time: 0.001 FPS: 752.4454 Mem: 371.81 Mb/s
GeForce GTX 480:
CUBitonic Time: 0.001 FPS: 966.1836 Mem: 477.43 Mb/s
CURadix Time: 0.001 FPS: 1398.6014 Mem: 691.11 Mb/s
OpenCL:
GeForce 9600GT:
CLBitonic Time: 0.008 FPS: 126.9519 Mem: 62.73 Mb/s
CLRadix Time: 0.004 FPS: 252.9724 Mem: 125.00 Mb/s
GeForce GTX 260:
CLBitonic Time: 0.004 FPS: 266.7378 Mem: 131.81 Mb/s
CLRadix Time: 0.003 FPS: 368.3241 Mem: 182.00 Mb/s
GeForce GTX 480:
CLBitonic Time: 0.002 FPS: 579.3743 Mem: 286.29 Mb/s
CLRadix Time: 0.002 FPS: 614.2506 Mem: 303.53 Mb/s
Radeon HD5870:
CLRadix Time: 0.003 FPS: 372.9952 Mem: 184.31 Mb/s
Single SPU:
quickSort SPU Time: 0.031 FPS: 32.5298 Mem: 32.15 Mb/s
radixSort SPU Time: 0.004 FPS: 226.9632 Mem: 224.30 Mb/s
Elements 1036288:
CUDA:
GeForce 9600GT:
CURadix Time: 0.036 FPS: 27.8808 Mem: 220.43 Mb/s
GeForce GTX 260:
CURadix Time: 0.013 FPS: 74.1400 Mem: 586.17 Mb/s
GeForce GTX 480:
CURadix Time: 0.006 FPS: 161.9958 Mem: 1280.78 Mb/s
OpenCL:
GeForce 9600GT:
CLRadix Time: 0.034 FPS: 29.8312 Mem: 235.85 Mb/s
GeForce GTX 260:
CLRadix Time: 0.013 FPS: 79.2959 Mem: 626.93 Mb/s
GeForce GTX 480:
CLRadix Time: 0.007 FPS: 136.2955 Mem: 1077.59 Mb/s
Radeon HD5870:
CLRadix Time: 0.020 FPS: 49.9800 Mem: 395.15 Mb/s
Single SPU:
quickSort SPU Time: 0.695 FPS: 1.4381 Mem: 22.74 Mb/s
radixSort SPU Time: 0.070 FPS: 14.3275 Mem: 226.55 Mb/s
Monday, September 6, 2010
Friday, September 3, 2010
Terrain occluder
WorldOccluderTerrain is a new node which can occlude all invisible objects in very efficient way. Heightmap image is used as data source and raycasting operations are performed on CPU. Cone step mapping is used as raycasting optimization. Artistic quality of raycasted image is ugly but number of occluded triangles is very large :)
Monday, August 23, 2010
Terrain editor
First version of terrian editor. There are some time lags by texture update because video is grabbed in real-time:
Sunday, June 13, 2010
Saturday, June 12, 2010
PlayStation3 Unigine status X
Each rendered triangle of static and skinned meshes is culled by view frustum and front/back face tests on SPU. Skinning is performed entirely on SPU with both ordinary and dual quaternion skinning modes available. Vertex buffers for particles are generated on SPU with view frustum culling. Particles can be sorted by depth on SPU. Particles are simulated on SPU with awesome performance.
Monday, June 7, 2010
PlayStation3 Unigine status IX
PlayStation3 Unigine status VIII
Tuesday, June 1, 2010
PlayStation3 Unigine status VII
Saturday, May 29, 2010
PlayStation3 Unigine status VI
The first successfully implemented SPU-accelerated thing in Unigine is particle systems render buffer generation. We have wide amount of different particle shapes: billboard, flat, point, length, random. This shapes are generated absolutely asynchronously on all available SPU's with optional depth sorting of particles, which doesn't affect performance a lot. The next target to be boosted is MeshSkinned.
Unigine SPU runtime
Insomniac papers about their SPU shaders and SPU job management inspired me to implement such system in Unigine. We use raw SPU with SPU-based shader loading scheme. The total amount of code of this system, which parses ELF executable files itself and manages SPU shaders, is just only 1270 lines :) We can run 10000 SPU shaders with up to 8 parameters consuming only 6.7ms of PPU time. Another tasty feature of this system is ability to execute any function from SPU ELF file.
SPU sorting
As you might know from the public papers, there are only 256 Kb of local memory on SPU, but DMA requests are very fast... quickSort isn't an appropriate algorithm for SPU architecture due to branching and generation of large number of spatially non-coherent memory requests. After several hours of attempts to keep SPU quickSort performance on an acceptable level and writing software implementation of memory cache for SPU, the resulting performance is slightly better than PPU version (on small arrays):
elements: 1012 (16Kb)
quickSort PPU Time: 0.000 FPS: 3717.4719
quickSort SPU Time: 0.000 FPS: 5376.3438
elements: 2024 (32Kb)
quickSort PPU Time: 0.001 FPS: 1923.0769
quickSort SPU Time: 0.000 FPS: 2652.5200
elements: 4048 (64Kb)
quickSort PPU Time: 0.001 FPS: 819.6721
quickSort SPU Time: 0.001 FPS: 908.2652
elements: 8096 (128Kb)
quickSort PPU Time: 0.003 FPS: 398.2477
quickSort SPU Time: 0.002 FPS: 407.3320
elements: 16192 (256Kb)
quickSort PPU Time: 0.005 FPS: 187.7229
quickSort SPU Time: 0.006 FPS: 180.3752
elements: 32384 (512Kb)
quickSort PPU Time: 0.012 FPS: 86.3185
quickSort SPU Time: 0.013 FPS: 78.7030
elements: 64768 (1Mb)
quickSort PPU Time: 0.027 FPS: 37.6322
quickSort SPU Time: 0.029 FPS: 34.2044
elements: 129536 (2Mb)
quickSort PPU Time: 0.056 FPS: 17.8352
quickSort SPU Time: 0.063 FPS: 15.8253
elements: 259072 (4Mb)
quickSort PPU Time: 0.124 FPS: 8.0358
quickSort SPU Time: 0.139 FPS: 7.2073
It was very difficult to sleep after this poor results... I was trying to implement radixSort on the second day in the morning... SPU instruction set fits such algorithms very well. Performance of radixSort on local SPU memory appeared to be very good especially with eliminated branching instructions. Moreover performance of DMA list operations on SPU (surprise!) is great and the resulted version of radixSort demonstrates awesome speedup:
elements: 1012 (16Kb)
radixSort PPU Time: 0.000 FPS: 4081.6326
radixSort SPU Time: 0.000 FPS: 9615.3848
elements: 2024 (32Kb)
radixSort PPU Time: 0.000 FPS: 2617.8010
radixSort SPU Time: 0.000 FPS: 4032.2581
elements: 4048 (64Kb)
radixSort PPU Time: 0.001 FPS: 1333.3334
radixSort SPU Time: 0.000 FPS: 2237.1365
elements: 8096 (128Kb)
radixSort PPU Time: 0.001 FPS: 673.4007
radixSort SPU Time: 0.001 FPS: 1168.2242
elements: 16192 (256Kb)
radixSort PPU Time: 0.003 FPS: 288.8504
radixSort SPU Time: 0.002 FPS: 597.0150
elements: 32384 (512Kb)
radixSort PPU Time: 0.008 FPS: 124.1311
radixSort SPU Time: 0.003 FPS: 298.3294
elements: 64768 (1Mb)
radixSort PPU Time: 0.022 FPS: 45.3700
radixSort SPU Time: 0.007 FPS: 149.8352
elements: 129536 (2Mb)
radixSort PPU Time: 0.049 FPS: 20.5351
radixSort SPU Time: 0.013 FPS: 75.0413
elements: 259072 (4Mb)
radixSort PPU Time: 0.101 FPS: 9.9149
radixSort SPU Time: 0.027 FPS: 37.5587
elements: 1012 (16Kb)
quickSort PPU Time: 0.000 FPS: 3717.4719
quickSort SPU Time: 0.000 FPS: 5376.3438
elements: 2024 (32Kb)
quickSort PPU Time: 0.001 FPS: 1923.0769
quickSort SPU Time: 0.000 FPS: 2652.5200
elements: 4048 (64Kb)
quickSort PPU Time: 0.001 FPS: 819.6721
quickSort SPU Time: 0.001 FPS: 908.2652
elements: 8096 (128Kb)
quickSort PPU Time: 0.003 FPS: 398.2477
quickSort SPU Time: 0.002 FPS: 407.3320
elements: 16192 (256Kb)
quickSort PPU Time: 0.005 FPS: 187.7229
quickSort SPU Time: 0.006 FPS: 180.3752
elements: 32384 (512Kb)
quickSort PPU Time: 0.012 FPS: 86.3185
quickSort SPU Time: 0.013 FPS: 78.7030
elements: 64768 (1Mb)
quickSort PPU Time: 0.027 FPS: 37.6322
quickSort SPU Time: 0.029 FPS: 34.2044
elements: 129536 (2Mb)
quickSort PPU Time: 0.056 FPS: 17.8352
quickSort SPU Time: 0.063 FPS: 15.8253
elements: 259072 (4Mb)
quickSort PPU Time: 0.124 FPS: 8.0358
quickSort SPU Time: 0.139 FPS: 7.2073
It was very difficult to sleep after this poor results... I was trying to implement radixSort on the second day in the morning... SPU instruction set fits such algorithms very well. Performance of radixSort on local SPU memory appeared to be very good especially with eliminated branching instructions. Moreover performance of DMA list operations on SPU (surprise!) is great and the resulted version of radixSort demonstrates awesome speedup:
elements: 1012 (16Kb)
radixSort PPU Time: 0.000 FPS: 4081.6326
radixSort SPU Time: 0.000 FPS: 9615.3848
elements: 2024 (32Kb)
radixSort PPU Time: 0.000 FPS: 2617.8010
radixSort SPU Time: 0.000 FPS: 4032.2581
elements: 4048 (64Kb)
radixSort PPU Time: 0.001 FPS: 1333.3334
radixSort SPU Time: 0.000 FPS: 2237.1365
elements: 8096 (128Kb)
radixSort PPU Time: 0.001 FPS: 673.4007
radixSort SPU Time: 0.001 FPS: 1168.2242
elements: 16192 (256Kb)
radixSort PPU Time: 0.003 FPS: 288.8504
radixSort SPU Time: 0.002 FPS: 597.0150
elements: 32384 (512Kb)
radixSort PPU Time: 0.008 FPS: 124.1311
radixSort SPU Time: 0.003 FPS: 298.3294
elements: 64768 (1Mb)
radixSort PPU Time: 0.022 FPS: 45.3700
radixSort SPU Time: 0.007 FPS: 149.8352
elements: 129536 (2Mb)
radixSort PPU Time: 0.049 FPS: 20.5351
radixSort SPU Time: 0.013 FPS: 75.0413
elements: 259072 (4Mb)
radixSort PPU Time: 0.101 FPS: 9.9149
radixSort SPU Time: 0.027 FPS: 37.5587
Intel SSE performance issue
Particle system render buffer generation has been deeply refactored to obtain better performance. There was a strange performance issue during this process... You can see two versions of the same code bellow. There is no performance difference on AMD CPU between the first and the second rendering code fragments. But on Intel Core i5 the difference is huge. The first version generates only 10M particles per second, while the second one shows 60M particles per second!
|
Monday, May 17, 2010
PlayStation3 Unigine status V
PlayStation3 render has been refactored several times:) Now we have stable and fast render pipeline without any CPU/GPU sync points. Render present time is always positive and we can hide update, physics and command buffer generation time when GPU renders previous frame. All Unigine demos work stable and without any rendering artifacts. Main bottleneck is GPU and SPU should helps a lot, especially in geometry culling :)
Sanctuary forward lighting:
Sanctuary pre-pass lighting:
Tropics forward lighting:
Tropics pre-pass lighting:
Heaven forward lighting:
Heaven pre-pass lighting:
Sanctuary forward lighting:
Sanctuary pre-pass lighting:
Tropics forward lighting:
Tropics pre-pass lighting:
Heaven forward lighting:
Heaven pre-pass lighting:
Wednesday, May 12, 2010
Thursday, May 6, 2010
Updated ObjectWater shading
Saturday, May 1, 2010
PlayStation3 Unigine status V
Direct port to PlayStation3 is complete. Render and physics works properly with expected performance. Time for optimizations is coming...
Unigine Editor on PlayStation3
Friday, April 30, 2010
Thursday, April 29, 2010
Monday, April 26, 2010
PlayStation3 Unigine status
First launch of Unigine on PlayStation3:
Engine::init(): can't create log file
Xml::load(): can't open "/unigine.cfg" file
Config::load(): can't open "/unigine.cfg" file
Engine::init(): clear video settings for "RSX Reality Synthesizer"
FileSystem::load_dir(): can't open "/data/" directory
Unigine fatal error
Engine::init(): can't create log file
Xml::load(): can't open "/unigine.cfg" file
Config::load(): can't open "/unigine.cfg" file
Engine::init(): clear video settings for "RSX Reality Synthesizer"
FileSystem::load_dir(): can't open "/data/" directory
Engine::init(): can't initialize filesystem
Engine::init(): can't create log file
Xml::load(): can't open "/unigine.cfg" file
Config::load(): can't open "/unigine.cfg" file
Engine::init(): clear video settings for "RSX Reality Synthesizer"
FileSystem::load_dir(): can't open "/data/" directory
Unigine fatal error
Engine::init(): can't create log file
Xml::load(): can't open "/unigine.cfg" file
Config::load(): can't open "/unigine.cfg" file
Engine::init(): clear video settings for "RSX Reality Synthesizer"
FileSystem::load_dir(): can't open "/data/" directory
Engine::init(): can't initialize filesystem
Sunday, April 25, 2010
PlayStation3 framework status II
PlayStation3 framework is ready for Unigine. Tomorrow will be awesome.
Friday, April 23, 2010
PlayStation3 framework status
Framework for PlayStation3 is almost complete. I should just write several base classes before Unigine PS3 migration.
There are ugly screens from PlayStation3 ;)
PS3Texture, full 2D support, 3D and Cube will be available soon:
PS3Gui, this is a mind-blowing stuff because of full mouse and keyboard support:
PS3Shader, can't say about details, but they work really great ;)
PS3Mesh:
PS3MeshSkinned, vertex shader based skinning for beginning:
PS3MeshDynamic:
PS3Particles, just PPU based simulation:
PS3Grass:
There are ugly screens from PlayStation3 ;)
PS3Texture, full 2D support, 3D and Cube will be available soon:
PS3Gui, this is a mind-blowing stuff because of full mouse and keyboard support:
PS3Shader, can't say about details, but they work really great ;)
PS3Mesh:
PS3MeshSkinned, vertex shader based skinning for beginning:
PS3MeshDynamic:
PS3Particles, just PPU based simulation:
PS3Grass:
Wednesday, April 21, 2010
ATI2 to DXT5 conversion
Sunday, April 18, 2010
Friday, April 16, 2010
Our tessellation talk from GDC
Practical Use of Tessellation in Unigine Heaven Benchmark
by Denis Shergin, Alexander Zaprjagaev
from NVIDIA Game Technology Theater @ GDC
by Denis Shergin, Alexander Zaprjagaev
from NVIDIA Game Technology Theater @ GDC
Wednesday, April 14, 2010
Wednesday, April 7, 2010
Hybrid render
Unigine render has severely mutated. New render capability of light pre-pass rendering mode is added. Forward and light pre-pass rendering can work at same time. The rendering mode of the lights can be easily switched with the distance, allowing light LOD system of a kind. Light pre-pass rendering mode doesn't multiply the amount of rendered geometry and results in a moderate quality level (because using Phong shading with fixed specular power). At the same time, forward-rendered lights can have all the shading variations, such as Phong-Rim, Anisotropy and Fresnel-based specular component.
Forward rendering:
Light pre-pass rendering:
Forward rendering:
Light pre-pass rendering:
PS: These screenshots are powered by OpenGL 4.0 tessellation.
Forward rendering:
Light pre-pass rendering:
Forward rendering:
Light pre-pass rendering:
PS: These screenshots are powered by OpenGL 4.0 tessellation.
Subscribe to:
Posts (Atom)