KGL rendering limits?
- bbmario
- DCEmu Freak
- Posts: 88
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Wed Feb 05, 2014 5:58 am
- Has thanked: 9 times
- Been thanked: 3 times
KGL rendering limits?
I'm working on some 3D models for some experiments with the KGL renderer and i wanted to know the rendering limitations. How many vertexes/triangles can be displayed? I have some models with 1000~2000 tris.
Re: KGL rendering limits?
Ph3nom should be able to tell you about the limits, he benchmarked his new KGL implementation quite rigorously, there are benchmarks in the example folder of KOS and you can find threads about it on here.
You can also improve the poly count a lot by calling glEnable/Disable GL_KOS_NEARZ_CLIPPING. You should turn near z clipping off when you can guarantee that your meshes are in front of the camera entirely and turn it off when they intersect with the near z plane or you will get glitches (this is a DC limitation).
You can also improve the poly count a lot by calling glEnable/Disable GL_KOS_NEARZ_CLIPPING. You should turn near z clipping off when you can guarantee that your meshes are in front of the camera entirely and turn it off when they intersect with the near z plane or you will get glitches (this is a DC limitation).
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- BlueCrab
- The Crabby Overlord
- Posts: 5652
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: KGL rendering limits?
It is in the libgl git repository, which can easily be fetched along with the rest of the kos-ports libraries.bbmario wrote:Where i can find this new KGL implementation? Is it in the current SVN?
-
- Insane DCEmu
- Posts: 112
- Joined: Sat Sep 22, 2007 9:43 pm
- Location: Braga - Portugal
- Has thanked: 0
- Been thanked: 0
Re: KGL rendering limits?
That probably explains the glitches I'm having on real HW.bogglez wrote:Ph3nom should be able to tell you about the limits, he benchmarked his new KGL implementation quite rigorously, there are benchmarks in the example folder of KOS and you can find threads about it on here.
You can also improve the poly count a lot by calling glEnable/Disable GL_KOS_NEARZ_CLIPPING. You should turn near z clipping off when you can guarantee that your meshes are in front of the camera entirely and turn it off when they intersect with the near z plane or you will get glitches (this is a DC limitation).
I will give it a shot.
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: KGL rendering limits?
bbmario - Did you make any progress with your 3D modeling experiments?
KGL performs well in immediate mode, and even better when submitting arrays.
Tonight I made a test rendering Quake 3 BSP's.
Test 1: Immediate Mode. Result: ~23.26 fps @ .76 mil verts/sec.
Test 2: Pre-Processed Arrays Mode. Result: ~57.60 fps @ 2.01 mil verts/sec.
KGL performs well in immediate mode, and even better when submitting arrays.
Tonight I made a test rendering Quake 3 BSP's.
Test 1: Immediate Mode. Result: ~23.26 fps @ .76 mil verts/sec.
Spoiler!
Test 2: Pre-Processed Arrays Mode. Result: ~57.60 fps @ 2.01 mil verts/sec.
Spoiler!
Re: KGL rendering limits?
Awesome results!
How about testing glDisable( GL_KOS_NEARZ_CLIPPING ) with such a scene for objects in front of the near plane? Big results I bet?
How about testing glDisable( GL_KOS_NEARZ_CLIPPING ) with such a scene for objects in front of the near plane? Big results I bet?
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: KGL rendering limits?
Thanks man!
The frustum culling is rolled out as a part of the PVS system for now, in this demo that has been disabled for testing raw vertex throughput.
In this demo ( not using Light Maps ), I am rendering ~75 arrays ( one per texture ) that contain anywhere from 3 to ~7000 vertices per array at OpenGL with NEARZ_CLIPPING enabled, GL_LIGHTING is disabled.
But you are right, the vertex throughput would certainly be higher if we glDisable( GL_KOS_NEARZ_CLIPPING ).
For example, a high polygon model, say the player model in a 3rd person game where the player is always at the center of the screen, could hit higher throughput since we could skip clipping the model.
The frustum culling is rolled out as a part of the PVS system for now, in this demo that has been disabled for testing raw vertex throughput.
In this demo ( not using Light Maps ), I am rendering ~75 arrays ( one per texture ) that contain anywhere from 3 to ~7000 vertices per array at OpenGL with NEARZ_CLIPPING enabled, GL_LIGHTING is disabled.
But you are right, the vertex throughput would certainly be higher if we glDisable( GL_KOS_NEARZ_CLIPPING ).
For example, a high polygon model, say the player model in a 3rd person game where the player is always at the center of the screen, could hit higher throughput since we could skip clipping the model.
Re: KGL rendering limits?
Do you plan to add those projects to the examples as benchmarks? Would be very useful I bet.
I was also wondering how much of a performance hit clipping is, so vertex throughput with and without clipping could be interesting. I assume it's a big hit on performance though.
I was also wondering how much of a performance hit clipping is, so vertex throughput with and without clipping could be interesting. I assume it's a big hit on performance though.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: KGL rendering limits?
So I have made some solid progress on optimizing the clipping algorithm, nearly re-writing the entire thing again
I have written an assembly routine that will transform a triangle and check vertices for clipping.
The routine takes an input vertex position array, its stride, a uv coord array, its stride, and a pvr_vertex_t output array as parameters.
The input parameters are built to handle glDrawArrays, where each component is stored in a separate array.
If the vertices are completely out, they wont even be pushed out of the registers.
If the vertices are completely in, perspective division will be applied before writing to output.
Currently, if the vertices cross, the vertices are written without perspective divide, to be handled outside of the assembly routine.
For kicks, the assembly code looks like this:
And the code that invokes it is as follows:
So, the results are good perfect clipping with very fast speed!
This map, using the old clipping code, ran at ~28fps @ 2.14mil verts/sec.
Now, using the new assembly clipping code, this map runs at ~45fps @3.4mil verts/sec
I have written an assembly routine that will transform a triangle and check vertices for clipping.
The routine takes an input vertex position array, its stride, a uv coord array, its stride, and a pvr_vertex_t output array as parameters.
The input parameters are built to handle glDrawArrays, where each component is stored in a separate array.
If the vertices are completely out, they wont even be pushed out of the registers.
If the vertices are completely in, perspective division will be applied before writing to output.
Currently, if the vertices cross, the vertices are written without perspective divide, to be handled outside of the assembly routine.
For kicks, the assembly code looks like this:
Spoiler!
Spoiler!
This map, using the old clipping code, ran at ~28fps @ 2.14mil verts/sec.
Now, using the new assembly clipping code, this map runs at ~45fps @3.4mil verts/sec
Re: KGL rendering limits?
So you went from 36ms frame time to 22ms? That's a ridiculously huge optimization! Where do you think that comes from? Register pressure?
I didn't have much time recently, but I was getting unhappy with the way I submit vertices. I wanted to split up processing of coordinates, UVs, normals and colors, but I fear that would not make good use of the cache.
EDIT: btw I noticed that you didn't implement glMultiDrawArrays yet. May I suggest moving the current implementation (with a for loop) into glMultiDrawArrays instead, and calling glMultiDrawArrays from glDrawArrays? The advantage is that when you draw multiple objects from one vertex buffer, you only need to perform the init tasks once (load the transform matrix etc).
ref. http://programming4.us/multimedia/8302.aspx
I didn't have much time recently, but I was getting unhappy with the way I submit vertices. I wanted to split up processing of coordinates, UVs, normals and colors, but I fear that would not make good use of the cache.
EDIT: btw I noticed that you didn't implement glMultiDrawArrays yet. May I suggest moving the current implementation (with a for loop) into glMultiDrawArrays instead, and calling glMultiDrawArrays from glDrawArrays? The advantage is that when you draw multiple objects from one vertex buffer, you only need to perform the init tasks once (load the transform matrix etc).
ref. http://programming4.us/multimedia/8302.aspx
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: KGL rendering limits?
So, I am curious what the actual limit of polys we can hit using kos is, and use that as a benchmark against what we can hit with KGL.
I think I have pretty much found the limit, both using mat_transform_sq, as well as a custom assembly routine I have written.
My assembly routine takes three input array pointers(position, uv, and argb color), the strides for those arrays, a pvr_vertex_t *dest, and the count of triangles to transform (this routine tested is written for transforming triangles, as such, it will write the pvr_vertex flags to the dest for you.)
My code looks like this:
So, for a quick test, I create a vertex buffer in main RAM to serve as the input to the function
So, this loop creates an array of exactly 65664 vertices, or 21,888 triangles.
Testing transforming vertices in RAM ( to see only CPU time ), I find my code is actually faster:
mat_transform_sq run 100 times hits 1087msec.
my assembly code run 100 times hits 956msec.
Submitting directly to the PVR, we are running at the full 60fps, we can hit an actual 1,313,280 polygons/sec.
Here, my code uses 10msec/frame, where mat_transform_sq uses 11msec/frame.
However, this is a highly optimized example; it is done in a single draw call, and it only submits one pre-compiled pvr_poly_hdr_t.
These triangles are gouraud shaded, but non-textured. I will need to make another test to see how texturing effects these numbers.
Also, I noticed that when I tested random polygons, the PVR began to choke down as triangles overlapped many times.
NullDC suggests 6.44mil verts/sec, but the real number is actually 3,939,840 vertices / sec.
Edit, another test, bypassing KGL, and using the SH4/PVR directly.
The Quake 3 Vertices are converted to the pvr_vertex_t format, with color and flag pre-set.
The vertices are arranged in arrays, with each array containing all of the vertices that share the same texture.
It is very hard to hit that kind of vertex throughput in a realistic scenario:
(map from here: http://lvlworld.com/review/id:1743)
Testing another BSP, it does in fact seem possible to hit that number:
I think I have pretty much found the limit, both using mat_transform_sq, as well as a custom assembly routine I have written.
My assembly routine takes three input array pointers(position, uv, and argb color), the strides for those arrays, a pvr_vertex_t *dest, and the count of triangles to transform (this routine tested is written for transforming triangles, as such, it will write the pvr_vertex flags to the dest for you.)
My code looks like this:
Spoiler!
Spoiler!
Testing transforming vertices in RAM ( to see only CPU time ), I find my code is actually faster:
mat_transform_sq run 100 times hits 1087msec.
my assembly code run 100 times hits 956msec.
Submitting directly to the PVR, we are running at the full 60fps, we can hit an actual 1,313,280 polygons/sec.
Here, my code uses 10msec/frame, where mat_transform_sq uses 11msec/frame.
However, this is a highly optimized example; it is done in a single draw call, and it only submits one pre-compiled pvr_poly_hdr_t.
These triangles are gouraud shaded, but non-textured. I will need to make another test to see how texturing effects these numbers.
Also, I noticed that when I tested random polygons, the PVR began to choke down as triangles overlapped many times.
NullDC suggests 6.44mil verts/sec, but the real number is actually 3,939,840 vertices / sec.
Edit, another test, bypassing KGL, and using the SH4/PVR directly.
The Quake 3 Vertices are converted to the pvr_vertex_t format, with color and flag pre-set.
The vertices are arranged in arrays, with each array containing all of the vertices that share the same texture.
It is very hard to hit that kind of vertex throughput in a realistic scenario:
(map from here: http://lvlworld.com/review/id:1743)
Testing another BSP, it does in fact seem possible to hit that number:
- bbmario
- DCEmu Freak
- Posts: 88
- Joined: Wed Feb 05, 2014 5:58 am
- Has thanked: 9 times
- Been thanked: 3 times
Re: KGL rendering limits?
I've been learning my way around with "old" GL code, since i learned modern GL first (shaders, VBO's, etc.). But so far, so good! Thanks for asking, PH3NOM! By the way, what do you mean by pre-processed arrays?
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: KGL rendering limits?
Shaders not really possible on DC... I have recently considered some sort of pre-set shader functionality based on a fixed set of shader operations, but a full-blown programmable pipeline is not realistic.
VBO's should be possible to an extent; I have recently added basic VAO functionality (glGenVertexArray, etc.). By the end of this week I should have a new commit ready to update the API implementing VAO's.
By pre-processed I mean that Quake 3 BSP's store their vertices in a different format then the DC's PVR uses, and the BSP vertices are stored in an indexed array that is intended for use with glDrawElements.
The Pre-Process that I refer to means that before making any render call, I convert the Quake3 BSP vertices into the DC's pvr_vertex_t vertex format.
This involves converting the color format from RGBA as used in the Q3 vertices into ARGB for use with pvr_vertex_t vertices.
In the process, I extract the indexed geometry into a linear array that can be rendered with glDrawArrays(...).
The reason for that is that glDrawArrays is faster than glDrawElements on DC due to the fact that the PVR does not directly support indexed geometry, so the geometry must be un-indexed ( in software by the API ) per frame before submission to the PVR on DC.
VBO's should be possible to an extent; I have recently added basic VAO functionality (glGenVertexArray, etc.). By the end of this week I should have a new commit ready to update the API implementing VAO's.
By pre-processed I mean that Quake 3 BSP's store their vertices in a different format then the DC's PVR uses, and the BSP vertices are stored in an indexed array that is intended for use with glDrawElements.
The Pre-Process that I refer to means that before making any render call, I convert the Quake3 BSP vertices into the DC's pvr_vertex_t vertex format.
This involves converting the color format from RGBA as used in the Q3 vertices into ARGB for use with pvr_vertex_t vertices.
In the process, I extract the indexed geometry into a linear array that can be rendered with glDrawArrays(...).
The reason for that is that glDrawArrays is faster than glDrawElements on DC due to the fact that the PVR does not directly support indexed geometry, so the geometry must be un-indexed ( in software by the API ) per frame before submission to the PVR on DC.
Re: KGL rendering limits?
That sounds like a fun project, but in practice I don't think it's useful. It's way too limited and guaranteed to be slow.PH3NOM wrote:Shaders not really possible on DC... I have recently considered some sort of pre-set shader functionality based on a fixed set of shader operations, but a full-blown programmable pipeline is not realistic.
I've already implemented that in my libgl as well. The advantages are not as big as on desktop platforms (since the vertex data cannot be put into VRAM due to transformation), but some exist:VBO's should be possible to an extent; I have recently added basic VAO functionality (glGenVertexArray, etc.). By the end of this week I should have a new commit ready to update the API implementing VAO's.
VAO (with or without VBO): no need to set up the vertex attributes every time, saves function calls, branches, etc.
VBO: Basically this gives libgl ownership over the vertex memory. This could allow some optimizations like calculating a bounding volume after glBufferData and using that to improve near clipping?
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream