Quake 3 lightmaps - PVR Multi-Texture
-
- DCEmu Crazy Poster
- Posts: 28
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Wed Apr 23, 2014 3:04 pm
- Has thanked: 0
- Been thanked: 0
Re: Quake 3 lightmaps - PVR Multi-Texture
interesting, is PVR2 accumulation buffers can be used to make such effect but without additional render passes ?
if you set "DST Select" = 1 in polygon TSP instruction word - result of drawing this poly will be stored to secondary (internal PVR's) buffer.
if you set "SRC Select" = 1 - RGBA from secondary buffer will be used as source by blender unit, instead of normal RGBA coming from texture/shading unit (which will be ignored).
btw, imo exactly this features was called by Sega "multitexturing support".
At least one game uses acc.buffers to make similar blurry-effect - "Evil Dead - Hail to the King".
but I afraid you cant use nullDC to test this features, because nullDC not emulates PVR2 acc.buffers (as many many other things )
if you set "DST Select" = 1 in polygon TSP instruction word - result of drawing this poly will be stored to secondary (internal PVR's) buffer.
if you set "SRC Select" = 1 - RGBA from secondary buffer will be used as source by blender unit, instead of normal RGBA coming from texture/shading unit (which will be ignored).
btw, imo exactly this features was called by Sega "multitexturing support".
At least one game uses acc.buffers to make similar blurry-effect - "Evil Dead - Hail to the King".
but I afraid you cant use nullDC to test this features, because nullDC not emulates PVR2 acc.buffers (as many many other things )
-
- Team Screamcast
- Posts: 144
- Joined: Tue Dec 23, 2003 6:04 pm
- Location: Umeå, Sweden
- Has thanked: 0
- Been thanked: 0
- Contact:
Re: Quake 3 lightmaps - PVR Multi-Texture
No, you'll actually need an additional pass to flush the secondary accumulation buffer.
It's mostly used to mask out multitexture effects from the transparent parts of the base texture.
It's mostly used to mask out multitexture effects from the transparent parts of the base texture.
https://github.com/tvspelsfreak/texconv - Converts images into any texture format supported on the DC.
-
- DCEmu Crazy Poster
- Posts: 28
- Joined: Wed Apr 23, 2014 3:04 pm
- Has thanked: 0
- Been thanked: 0
Re: Quake 3 lightmaps - PVR Multi-Texture
Nope, you'll need additional polygon(s) to flush buffer (or blend it to primary), not render pass.
I've seen buffers usage only in two games - Evil Dead and Virtua Fighter 4 (on Naomi2)
I've seen buffers usage only in two games - Evil Dead and Virtua Fighter 4 (on Naomi2)
-
- Team Screamcast
- Posts: 144
- Joined: Tue Dec 23, 2003 6:04 pm
- Location: Umeå, Sweden
- Has thanked: 0
- Been thanked: 0
- Contact:
Re: Quake 3 lightmaps - PVR Multi-Texture
Yeah, you're right, I should've explained it better. What I meant was you'll have to send the same geometry one more time to flush the secondary accumulation buffer.
EDIT: I thought you were talking about the lightmapping (which already is a one pass solution).
I haven't messed around with the secondary accumulation buffer much. I tried doing stencil reflections with it, but it appears you must use the same geometry to flush it as you used to render to it. Flushing with a mirror plane gave me very weird results... It would be cool to be able to use the buffer in a more flexible way, but I don't know if it's possible.
EDIT: I thought you were talking about the lightmapping (which already is a one pass solution).
I haven't messed around with the secondary accumulation buffer much. I tried doing stencil reflections with it, but it appears you must use the same geometry to flush it as you used to render to it. Flushing with a mirror plane gave me very weird results... It would be cool to be able to use the buffer in a more flexible way, but I don't know if it's possible.
https://github.com/tvspelsfreak/texconv - Converts images into any texture format supported on the DC.
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: Quake 3 lightmaps - PVR Multi-Texture
I have done just that, now my Open GL API supports Multi-Texturing, currently only 2 texture units may be bound at a time ( GL_TETURE0 = opaque, GL_TEXTURE1 = alpha ).PH3NOM wrote:I have been thinking of a way to do multi-texture much faster than the way I am currently doing it using my build of Open GL.
Basically right now, making 2 passes, I am submitting the geometry twice for each vertex.
This means each vertex gets possibly ( clipped, light, transformed ) each time submitted.
My idea is I can simply allow the submission of two separate textures ( opaque + alpha ) with almost no extra cost on the CPU, by computing the output vertex ( light, clipped, transformed ), then copy into each list ( opaque, alpha ).
I have even implemented this with the standard pipeline (glBegin(...)/glEnd()), as well as the vertex buffer pipeline (glDrawArrays()).
This is the very simple function I made to test and its working just fine on DC:
Code: Select all
GLfloat VERTEX_ARRAY[4 * 3] = { -1.0f, 1.0f, 0.0f,
1.0f, 1.0f, 0.0f,
1.0f, -1.0f, 0.0f,
-1.0f, -1.0f, 0.0f };
GLfloat TEXCOORD_ARRAY[4 * 2] = { 0, 0,
1, 0,
1, 1,
0, 1 };
/* Multi-Texture Example using Open GL Vertex Buffer Submission.
glClientActiveTexture() must be used for Arrays, instead of glActiveTexture().
Each texture must recieve its own set of UV Coordinates */
void RenderCallback(GLuint texID0, GLuint texID1)
{
glLoadIdentity();
glTranslatef(0.0f, 0.0f, -3.0f);
/* Enable Vertex and Texture Coord Arrays */
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
/* Activate GL_TEXTURE0, bind the base opaque texture, and for fun, enable bi-linear filtering */
glClientActiveTexture(GL_TEXTURE0); /* glClientActiveTexture(...) For use with Multi-Texture Arrays */
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, texID0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_FILTER, GL_LINEAR);
glTexCoordPointer(2, GL_FLOAT, 0, TEXCOORD_ARRAY); /* Bind TexCoord Array for GL_TEXTURE0 */
/* Activate GL_TEXTURE1, bind the texture to blend on top, and for fun, enable bi-linear filtering */
glClientActiveTexture(GL_TEXTURE1); /* glClientActiveTexture(...) For use with Multi-Texture Arrays */
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, texID1);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_FILTER, GL_LINEAR);
glTexCoordPointer(2, GL_FLOAT, 0, TEXCOORD_ARRAY); /* Bind TexCoord Array for GL_TEXTURE1 */
/* Set blending modes to be applied to GL_TEXUTRE1 */
glBlendFunc(GL_SRC_ALPHA, GL_DST_ALPHA);
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
/* Bind the Vertex Array */
glVertexPointer(3, GL_FLOAT, 0, VERTEX_ARRAY);
glDrawArrays(GL_QUADS, 0, 4);
/* Disable GL_TEXTURE1 */
glClientActiveTexture(GL_TEXTURE1);
glDisable(GL_TEXTURE_2D);
/* Make sure to set glActiveTexture back to GL_TEXTURE0 when finished */
glClientActiveTexture(GL_TEXTURE0);
glDisable(GL_TEXTURE_2D);
/* Disable Vertex and Texture Coord Arrays */
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
}
Overlaid on top of the base texture, only submitting 4 vertices to Open GL:
Re: Quake 3 lightmaps - PVR Multi-Texture
@PH3NOM:
Are you worried about immediate mode so that old software can be ported easily? Software using immediate mode will perform poorly anyway, so I think you shouldn't worry about optimizing it too much, at least for a first release.
BTW I'm curious about the performance difference between immediate mode, vertex arrays and VBOs using your API and maybe KOS' API. Did you ever benchmark this, by chance?
Are you worried about immediate mode so that old software can be ported easily? Software using immediate mode will perform poorly anyway, so I think you shouldn't worry about optimizing it too much, at least for a first release.
BTW I'm curious about the performance difference between immediate mode, vertex arrays and VBOs using your API and maybe KOS' API. Did you ever benchmark this, by chance?
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: Quake 3 lightmaps - PVR Multi-Texture
To be honest, I am not quite sure what you mean by "immediate mode".
Do you mean "Direct Rendering", the way the old KGL submitted vertex data to the PVR, or that of using "glVertex3f(...)" to submit vertex data, as opposed to glDrawArrays(...).
Its hard to benchmark Open GL modes against KOS, because KOS itself does not really handle such things that Open GL does.
The closest thing KOS has (by default) is mat_transform_sq(...), and if you follow the thread here, you will see that I was able to obtain better performance by devising my own methods:
viewtopic.php?f=29&t=102181
To that extent, the Vertex Buffer solutions I have devised produced higher throughput than the KOS dma functions.
Do you mean "Direct Rendering", the way the old KGL submitted vertex data to the PVR, or that of using "glVertex3f(...)" to submit vertex data, as opposed to glDrawArrays(...).
Its hard to benchmark Open GL modes against KOS, because KOS itself does not really handle such things that Open GL does.
The closest thing KOS has (by default) is mat_transform_sq(...), and if you follow the thread here, you will see that I was able to obtain better performance by devising my own methods:
viewtopic.php?f=29&t=102181
To that extent, the Vertex Buffer solutions I have devised produced higher throughput than the KOS dma functions.
Re: Quake 3 lightmaps - PVR Multi-Texture
Direct Rendering = opposite of software renderingPH3NOM wrote:To be honest, I am not quite sure what you mean by "immediate mode".
Do you mean "Direct Rendering", the way the old KGL submitted vertex data to the PVR, or that of using "glVertex3f(...)" to submit vertex data, as opposed to glDrawArrays(...).
Immediate Mode = Function calls that can only be used between glBegin() and glEnd().
You can optimize the other draw modes much better, for many reasons:
- Fewer function calls (in immediate mode there's at least one glVertex call per vertex and usually another one for uv, color, normal each)
- Rigid order (glNormal, glColor etc could be supplied in varying order or be missing for some vertices, not the case for glVertexAttrib)
- You know exactly what components will be defined (with immediate mode you don't know whether the last vertex of a thousand will suddenly have a glNormal call preceeding it, so you must assume it will be used)
- Since there's no indexing, you cannot use a vertex cache with transformations and lighting etc already applied (http://home.comcast.net/~tom_forsyth/pa ... e_opt.html)
Thank you for that link and great work!PH3NOM wrote: Its hard to benchmark Open GL modes against KOS, because KOS itself does not really handle such things that Open GL does.
The closest thing KOS has (by default) is mat_transform_sq(...), and if you follow the thread here, you will see that I was able to obtain better performance by devising my own methods:
viewtopic.php?f=29&t=102181
To that extent, the Vertex Buffer solutions I have devised produced higher throughput than the KOS dma functions.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: Quake 3 lightmaps - PVR Multi-Texture
Yes, I understand. And thank you for your interest!
That is the motivation for me deciding to add support for indexed arrays by implementing glDrawElements(...):
We can light and transform less vertices before assembling into primitives for rasterization.
That is the motivation for me deciding to add support for indexed arrays by implementing glDrawElements(...):
We can light and transform less vertices before assembling into primitives for rasterization.
Code: Select all
GLfloat VERTEX_ARRAY[4 * 3 * 2] = { -1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, -1.0f, 1.0f,
-1.0f, -1.0f, 1.0f,
-1.0f, 1.0f, -1.0f,
1.0f, 1.0f, -1.0f,
1.0f, -1.0f, -1.0f,
-1.0f, -1.0f, -1.0f };
GLfloat TEXCOORD_ARRAY[4 * 2 * 2] = { 0, 0,
1, 0,
1, 1,
0, 1,
1, 0,
0, 0,
0, 1,
1, 1 };
GLuint ARGB_ARRAY[4 * 2] = { 0xFFFF0000, 0xFF00FF00, 0xFF0000FF, 0xFFFFFF00,
0xFFFF0000, 0xFF00FF00, 0xFF0000FF, 0xFFFFFF00 };
GLubyte INDEX_ARRAY[4 * 6] = { 0, 1, 2, 3,
3, 2, 6, 7,
7, 6, 5, 4,
4, 5, 1, 0,
1, 5, 6, 2,
0, 4, 7, 3 };
/* Example using Open GL Vertex Buffer Element Submission. */
static GLfloat rx = 1.0f;
void RenderCallback(GLuint texID)
{
glLoadIdentity();
glTranslatef(0.0f, 0.0f, -6.0f);
glRotatef(rx++, 0, 1, 0);
/* Enable 2D Texturing and bind the Texture */
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, texID);
/* Enable Vertex, Color and Texture Coord Arrays */
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
/* Bind Array Data */
glColorPointer(1, GL_UNSIGNED_INT, 0, ARGB_ARRAY);
glTexCoordPointer(2, GL_FLOAT, 0, TEXCOORD_ARRAY);
glVertexPointer(3, GL_FLOAT, 0, VERTEX_ARRAY);
/* Render the Submitted Vertex Data */
glDrawElements(GL_QUADS, 4 * 6, GL_UNSIGNED_BYTE, INDEX_ARRAY);
/* Disable Vertex, Color and Texture Coord Arrays */
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
}
Re: Quake 3 lightmaps - PVR Multi-Texture
The pleasure is mine.PH3NOM wrote:Yes, I understand. And thank you for your interest!
That is the motivation for me deciding to add support for indexed arrays by implementing glDrawElements(...):
We can light and transform less vertices before assembling into primitives for rasterization.
By the way, reading those old threads I noticed some things about GL usage in the code that people write on here:
- Triangle strip optimization: Only really makes sense with unindexed data. With a vertex cache it will only give you a tiny performance improvement, while being much more bothersome to use in many ways.
- OpenGL matrix functions and the matrix stack:
Don't use those at all. Instead, define a tree structure for the transforms with each child storing the local transform and the absolute transform including all parent transforms. Changing a parent's transform should then recalculate the transform matrix of each child. This will save you a lot of matrix calculations and stack operations. Matrix calculations will also not be interleaved with drawing operations, so you will make better use of registers and the memory cache.
None of this is done inside of your OpenGL library, but instead in the code of its users.
I just want to point out that those functions are not of importance in your library since they shouldn't be used anyway, and you may want to remove them from the example code, so people new to 3D don't even start using those functions (even though your intent is just to use them for a quick demo).
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
-
- Insane DCEmu
- Posts: 112
- Joined: Sat Sep 22, 2007 9:43 pm
- Location: Braga - Portugal
- Has thanked: 0
- Been thanked: 0
Re: Quake 3 lightmaps - PVR Multi-Texture
boggelz , by OpenGL matrix functions you mean glTranslate , glRotate and glScale ?
Why should those be avoided ?
Do you have an example of how the transform tree should be implemented ?
Best Regards
Why should those be avoided ?
Do you have an example of how the transform tree should be implemented ?
Best Regards
Re: Quake 3 lightmaps - PVR Multi-Texture
The glTranslate, Rotate etc functions always implicitly work on a global matrix stack. You're constantly pushing and popping matrices that you could actually use again for the same object in the next frame, which would save you some matrix multiplications 30 times per second.Jae686 wrote:boggelz , by OpenGL matrix functions you mean glTranslate , glRotate and glScale ?
Why should those be avoided ?
Do you have an example of how the transform tree should be implemented ?
What I'm referring to is often called a "scene graph" (the term is very loosely defined and some people really go over board with it).
In its basic, sensible form, it just expresses a tree hierarchy of transformations in the scene, each node containing a local and an absolute transform.
For example the root node of the scene graph may be a ship. On the ship there's the captain and a cannon. So the root node gets those two as child nodes. The captain is also wearing a hat, so we need a transformation matrix from the position of the captain to his head in order to place the hat properly, so the captain has the hat as his child node.
To rotate the hat of the captain you just need to change its local transform. The absolute transform of the hat is now outdated, so we multiply the hat's local transform with the captain's absolute transform and we're done.
If the captain moves (his local transform changes), the hat also moves. So we need to update the captain's and hat's absolute transforms.
During all of this the matrices of the cannon and ship were unaffected.
When you draw a frame you walk the scene graph first and check whether a node is "dirty" as described above. If it is, you update the transforms for it and its children. Now when you want to draw any object you just load its absolute transform and start drawing.
You should see how this saves you an incredible amount of matrix transforms for even simple scenes. A basic graphics engine will also perform visibility detection (frustum culling etc), which is easier to do with the scene graph (you need the transform but don't care about parent nodes that aren't visible).
Aside from that you can write special matrix functions for complicated transformations that you commonly use. For example you don't always need a full 4x4 * 4 multiply. Sometimes you don't care about the w component, but you want the translation. Since you save a whole dot product (one row * col) which I think the DC supports in hardware, I think you could multiply more matrices on the DC this way too.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: Quake 3 lightmaps - PVR Multi-Texture
Over the weekend, I re-wrote the arrays submission in my Open GL API, including the clipping and lighting mechanism on arrays.
And after tight benchmarking, I also removed the function pointer system I was using before, and replaced that with better optimized pipelined loops.
When clipping is enabled, I have managed to save an entire transform per-vertex, compared to before, by preserving the w component and delaying perspective division until after the clipping stage.
Just as a test, I have run a quick sample of using glDrawArrays(...) with my Open GL API and Quake 3 BSP's.
Since the .bsp is in the romdisk, I cut out the actual textures, but the polygons are in fact textured.
In this demo, I am submitting every single face of the bsp without using the PVS system, and I am clipping every single vertex, and still we are sailing at 60fps with time to spare
And after tight benchmarking, I also removed the function pointer system I was using before, and replaced that with better optimized pipelined loops.
When clipping is enabled, I have managed to save an entire transform per-vertex, compared to before, by preserving the w component and delaying perspective division until after the clipping stage.
Just as a test, I have run a quick sample of using glDrawArrays(...) with my Open GL API and Quake 3 BSP's.
Since the .bsp is in the romdisk, I cut out the actual textures, but the polygons are in fact textured.
In this demo, I am submitting every single face of the bsp without using the PVS system, and I am clipping every single vertex, and still we are sailing at 60fps with time to spare
Re: Quake 3 lightmaps - PVR Multi-Texture
http://en.wikipedia.org/wiki/Potentially_visible_setJae686 wrote:PVS System ? What's a PVS system ?
BSP has a very efficient data structure to reduce the amount of polygons you need to render depending on the view point. If ph3nom were to implement the visibility tests, he would be able to render much bigger and more detailed environments. Right now he probably just renders the whole level, maybe with some frustum culling only.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: Quake 3 lightmaps - PVR Multi-Texture
Yes, the BSP data structure is quite efficient for its time indeed.
The PVS system is a form of occlusion culling. It determines the "Potentially Visible Surfaces" based on the "camera" position.
It does not take into account "camera" direction, so it is not quite parallax occlusion.
The BSP also includes bounding boxes for the leaf faces, so you can perform frustum culling on top of the PVS occlusion.
I have implemented the PVS system, and a first pass at nearz frustum culling using the BSP bounding boxes, so at least polygons behind the camera will not be submitted.
But I have disabled that for testing the raw vertex throughput of glDrawArrays vs glDrawElements.
Strangely enough, glDrawArrays is actually faster.
Testing an even larger BSP; Note the CPU time here using glDrawArrays():
Now, look at the CPU time here using glDrawElements():
As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
The PVS system is a form of occlusion culling. It determines the "Potentially Visible Surfaces" based on the "camera" position.
It does not take into account "camera" direction, so it is not quite parallax occlusion.
The BSP also includes bounding boxes for the leaf faces, so you can perform frustum culling on top of the PVS occlusion.
I have implemented the PVS system, and a first pass at nearz frustum culling using the BSP bounding boxes, so at least polygons behind the camera will not be submitted.
Spoiler!
Strangely enough, glDrawArrays is actually faster.
Testing an even larger BSP; Note the CPU time here using glDrawArrays():
Now, look at the CPU time here using glDrawElements():
As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
Re: Quake 3 lightmaps - PVR Multi-Texture
I was wondering whether you can replace those 8 dot products with two matrix multiplications to gain some speed. Put the vectors on the right side into the matrix and multiply it by cv, then do the if checks after the matrix multiplication (or just write "bbox_in = result[0] > 0 + result[1] > 0 ...). That should take 1/4th the time + some overhead to set up the matrix. It may also improve on the branching. Maybe that's faster?PH3NOM wrote:Spoiler!
That's really hard to comment on without the implementation of glDrawArrays and glDrawElements. Anyway, I think you should be able to release your GL API now! There are some people on this forum waiting for your code for their own projects, should be inspiringStrangely enough, glDrawArrays is actually faster.
As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: Quake 3 lightmaps - PVR Multi-Texture
Thank you for the encouragement.bogglez wrote:I was wondering whether you can replace those 8 dot products with two matrix multiplications to gain some speed. Put the vectors on the right side into the matrix and multiply it by cv, then do the if checks after the matrix multiplication (or just write "bbox_in = result[0] > 0 + result[1] > 0 ...). That should take 1/4th the time + some overhead to set up the matrix. It may also improve on the branching. Maybe that's faster?PH3NOM wrote:Spoiler!
That's really hard to comment on without the implementation of glDrawArrays and glDrawElements. Anyway, I think you should be able to release your GL API now! There are some people on this forum waiting for your code for their own projects, should be inspiringStrangely enough, glDrawArrays is actually faster.
As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
I have not even attempted to optimize the bounding box zculling, I just wrote that function quite very quickly.
However, a matrix transform is not nearly as fast as a dot product, so the 1/4 time is not quite right.
I think I benchmarked at least 24mil dot operations per second, and ~16mil matrix transforms per second.
And that was not reloading the transform matrix registers each operation, that costs ~11 cycles per operation, so doing that would obviously slow things down further.
And, it seems to be best to use the extended register bank ( matrix register ) only if you are using it to transform multiple vectors.
For the bounding box algorithm, each matrix would only transform 1 vector.
Anywhoo, I decided to disable texturing to see the geometry better.
Using glDrawElements(): ~43fps @ 1.6mil verts/sec.
Using glDrawArrays()(after unpacking the geometry into arrays): 60fps @ 2.3mil verts/sec