Quake 3 lightmaps - PVR Multi-Texture

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
MetalliC
DCEmu Crazy Poster
DCEmu Crazy Poster
Posts: 28
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Wed Apr 23, 2014 3:04 pm
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by MetalliC »

interesting, is PVR2 accumulation buffers can be used to make such effect but without additional render passes ?
if you set "DST Select" = 1 in polygon TSP instruction word - result of drawing this poly will be stored to secondary (internal PVR's) buffer.
if you set "SRC Select" = 1 - RGBA from secondary buffer will be used as source by blender unit, instead of normal RGBA coming from texture/shading unit (which will be ignored).
btw, imo exactly this features was called by Sega "multitexturing support".

At least one game uses acc.buffers to make similar blurry-effect - "Evil Dead - Hail to the King".

but I afraid you cant use nullDC to test this features, because nullDC not emulates PVR2 acc.buffers (as many many other things :mrgreen:)
Tvspelsfreak
Team Screamcast
Team Screamcast
Posts: 144
Joined: Tue Dec 23, 2003 6:04 pm
Location: Umeå, Sweden
Has thanked: 0
Been thanked: 0
Contact:

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by Tvspelsfreak »

No, you'll actually need an additional pass to flush the secondary accumulation buffer.

It's mostly used to mask out multitexture effects from the transparent parts of the base texture.
https://github.com/tvspelsfreak/texconv - Converts images into any texture format supported on the DC.
MetalliC
DCEmu Crazy Poster
DCEmu Crazy Poster
Posts: 28
Joined: Wed Apr 23, 2014 3:04 pm
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by MetalliC »

Nope, you'll need additional polygon(s) to flush buffer (or blend it to primary), not render pass.

I've seen buffers usage only in two games - Evil Dead and Virtua Fighter 4 (on Naomi2)
Tvspelsfreak
Team Screamcast
Team Screamcast
Posts: 144
Joined: Tue Dec 23, 2003 6:04 pm
Location: Umeå, Sweden
Has thanked: 0
Been thanked: 0
Contact:

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by Tvspelsfreak »

Yeah, you're right, I should've explained it better. What I meant was you'll have to send the same geometry one more time to flush the secondary accumulation buffer.

EDIT: I thought you were talking about the lightmapping (which already is a one pass solution). :oops:
I haven't messed around with the secondary accumulation buffer much. I tried doing stencil reflections with it, but it appears you must use the same geometry to flush it as you used to render to it. Flushing with a mirror plane gave me very weird results... It would be cool to be able to use the buffer in a more flexible way, but I don't know if it's possible.
https://github.com/tvspelsfreak/texconv - Converts images into any texture format supported on the DC.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by PH3NOM »

PH3NOM wrote:I have been thinking of a way to do multi-texture much faster than the way I am currently doing it using my build of Open GL.

Basically right now, making 2 passes, I am submitting the geometry twice for each vertex.
This means each vertex gets possibly ( clipped, light, transformed ) each time submitted.
My idea is I can simply allow the submission of two separate textures ( opaque + alpha ) with almost no extra cost on the CPU, by computing the output vertex ( light, clipped, transformed ), then copy into each list ( opaque, alpha ).
I have done just that, now my Open GL API supports Multi-Texturing, currently only 2 texture units may be bound at a time ( GL_TETURE0 = opaque, GL_TEXTURE1 = alpha ).
I have even implemented this with the standard pipeline (glBegin(...)/glEnd()), as well as the vertex buffer pipeline (glDrawArrays()).

This is the very simple function I made to test and its working just fine on DC:

Code: Select all

GLfloat VERTEX_ARRAY[4 * 3] = { -1.0f,  1.0f, 0.0f,
                                 1.0f,  1.0f, 0.0f,
							     1.0f, -1.0f, 0.0f,
							    -1.0f, -1.0f, 0.0f };

GLfloat TEXCOORD_ARRAY[4 * 2] = { 0, 0,
	                              1, 0,
								  1, 1,
								  0, 1 };

/* Multi-Texture Example using Open GL Vertex Buffer Submission.
   glClientActiveTexture() must be used for Arrays, instead of glActiveTexture().
   Each texture must recieve its own set of UV Coordinates */
void RenderCallback(GLuint texID0, GLuint texID1) 
{
    glLoadIdentity();
    glTranslatef(0.0f, 0.0f, -3.0f);

	/* Enable Vertex and Texture Coord Arrays */
	glEnableClientState(GL_VERTEX_ARRAY);
	glEnableClientState(GL_TEXTURE_COORD_ARRAY);

	/* Activate GL_TEXTURE0, bind the base opaque texture, and for fun, enable bi-linear filtering */
	glClientActiveTexture(GL_TEXTURE0); /* glClientActiveTexture(...) For use with Multi-Texture Arrays */
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, texID0);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_FILTER, GL_LINEAR);
	glTexCoordPointer(2, GL_FLOAT, 0, TEXCOORD_ARRAY); /* Bind TexCoord Array for GL_TEXTURE0 */

	/* Activate GL_TEXTURE1, bind the texture to blend on top, and for fun, enable bi-linear filtering */
	glClientActiveTexture(GL_TEXTURE1); /* glClientActiveTexture(...) For use with Multi-Texture Arrays */
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, texID1);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_FILTER, GL_LINEAR);
	glTexCoordPointer(2, GL_FLOAT, 0, TEXCOORD_ARRAY); /* Bind TexCoord Array for GL_TEXTURE1 */

	/* Set blending modes to be applied to GL_TEXUTRE1 */
	glBlendFunc(GL_SRC_ALPHA, GL_DST_ALPHA);
    glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);

    /* Bind the Vertex Array */
    glVertexPointer(3, GL_FLOAT, 0, VERTEX_ARRAY);
    glDrawArrays(GL_QUADS, 0, 4);	

	/* Disable GL_TEXTURE1 */
	glClientActiveTexture(GL_TEXTURE1);
	glDisable(GL_TEXTURE_2D);

	/* Make sure to set glActiveTexture back to GL_TEXTURE0 when finished */
	glClientActiveTexture(GL_TEXTURE0);
	glDisable(GL_TEXTURE_2D);

	/* Disable Vertex and Texture Coord Arrays */
	glDisableClientState(GL_TEXTURE_COORD_ARRAY);
	glDisableClientState(GL_VERTEX_ARRAY);
}
Even though the textures contain no alpha channel, the PVR is used to perform the blending of this texture here:
Image

Overlaid on top of the base texture, only submitting 4 vertices to Open GL:
Image
Jae686
Insane DCEmu
Insane DCEmu
Posts: 112
Joined: Sat Sep 22, 2007 9:43 pm
Location: Braga - Portugal
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by Jae686 »

I cant wait to try your API. :)
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by bogglez »

@PH3NOM:
Are you worried about immediate mode so that old software can be ported easily? Software using immediate mode will perform poorly anyway, so I think you shouldn't worry about optimizing it too much, at least for a first release.
BTW I'm curious about the performance difference between immediate mode, vertex arrays and VBOs using your API and maybe KOS' API. Did you ever benchmark this, by chance?
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by PH3NOM »

To be honest, I am not quite sure what you mean by "immediate mode".

Do you mean "Direct Rendering", the way the old KGL submitted vertex data to the PVR, or that of using "glVertex3f(...)" to submit vertex data, as opposed to glDrawArrays(...).

Its hard to benchmark Open GL modes against KOS, because KOS itself does not really handle such things that Open GL does.
The closest thing KOS has (by default) is mat_transform_sq(...), and if you follow the thread here, you will see that I was able to obtain better performance by devising my own methods:
viewtopic.php?f=29&t=102181

To that extent, the Vertex Buffer solutions I have devised produced higher throughput than the KOS dma functions.
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by BlueCrab »

Immediate mode in OpenGL would be the glVertex*() calls.
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by bogglez »

PH3NOM wrote:To be honest, I am not quite sure what you mean by "immediate mode".

Do you mean "Direct Rendering", the way the old KGL submitted vertex data to the PVR, or that of using "glVertex3f(...)" to submit vertex data, as opposed to glDrawArrays(...).
Direct Rendering = opposite of software rendering
Immediate Mode = Function calls that can only be used between glBegin() and glEnd().
You can optimize the other draw modes much better, for many reasons:
- Fewer function calls (in immediate mode there's at least one glVertex call per vertex and usually another one for uv, color, normal each)
- Rigid order (glNormal, glColor etc could be supplied in varying order or be missing for some vertices, not the case for glVertexAttrib)
- You know exactly what components will be defined (with immediate mode you don't know whether the last vertex of a thousand will suddenly have a glNormal call preceeding it, so you must assume it will be used)
- Since there's no indexing, you cannot use a vertex cache with transformations and lighting etc already applied (http://home.comcast.net/~tom_forsyth/pa ... e_opt.html)
PH3NOM wrote: Its hard to benchmark Open GL modes against KOS, because KOS itself does not really handle such things that Open GL does.
The closest thing KOS has (by default) is mat_transform_sq(...), and if you follow the thread here, you will see that I was able to obtain better performance by devising my own methods:
viewtopic.php?f=29&t=102181

To that extent, the Vertex Buffer solutions I have devised produced higher throughput than the KOS dma functions.
Thank you for that link and great work!
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by PH3NOM »

Yes, I understand. And thank you for your interest!

That is the motivation for me deciding to add support for indexed arrays by implementing glDrawElements(...):
We can light and transform less vertices before assembling into primitives for rasterization.

Code: Select all

GLfloat VERTEX_ARRAY[4 * 3 * 2] = { -1.0f,  1.0f, 1.0f,
                                     1.0f,  1.0f, 1.0f,
							         1.0f, -1.0f, 1.0f,
							        -1.0f, -1.0f, 1.0f,
                                    -1.0f,  1.0f, -1.0f,
                                     1.0f,  1.0f, -1.0f,
							         1.0f, -1.0f, -1.0f,
							        -1.0f, -1.0f, -1.0f };

GLfloat TEXCOORD_ARRAY[4 * 2 * 2] = { 0, 0,
	                              1, 0,
								  1, 1,
								  0, 1, 
                                  1, 0,
	                              0, 0,
								  0, 1,
								  1, 1 };

GLuint ARGB_ARRAY[4 * 2] = { 0xFFFF0000, 0xFF00FF00, 0xFF0000FF, 0xFFFFFF00,
                             0xFFFF0000, 0xFF00FF00, 0xFF0000FF, 0xFFFFFF00 };

GLubyte INDEX_ARRAY[4 * 6] = { 0, 1, 2, 3,
                               3, 2, 6, 7,
                               7, 6, 5, 4,
                               4, 5, 1, 0,
                               1, 5, 6, 2,
                               0, 4, 7, 3 };

/* Example using Open GL Vertex Buffer Element Submission. */
static GLfloat rx = 1.0f;
void RenderCallback(GLuint texID) 
{
    glLoadIdentity();
    glTranslatef(0.0f, 0.0f, -6.0f);

	glRotatef(rx++, 0, 1, 0);

	/* Enable 2D Texturing and bind the Texture */
	glEnable(GL_TEXTURE_2D);
	glBindTexture(GL_TEXTURE_2D, texID);

	/* Enable Vertex, Color and Texture Coord Arrays */
	glEnableClientState(GL_VERTEX_ARRAY);
	glEnableClientState(GL_TEXTURE_COORD_ARRAY);
	glEnableClientState(GL_COLOR_ARRAY);

	/* Bind Array Data */
	glColorPointer(1, GL_UNSIGNED_INT, 0, ARGB_ARRAY); 
	glTexCoordPointer(2, GL_FLOAT, 0, TEXCOORD_ARRAY); 
    glVertexPointer(3, GL_FLOAT, 0, VERTEX_ARRAY);

	/* Render the Submitted Vertex Data */
    glDrawElements(GL_QUADS, 4 * 6, GL_UNSIGNED_BYTE, INDEX_ARRAY);	

	/* Disable Vertex, Color and Texture Coord Arrays */
	glDisableClientState(GL_COLOR_ARRAY);
	glDisableClientState(GL_TEXTURE_COORD_ARRAY);
	glDisableClientState(GL_VERTEX_ARRAY);
}
Image
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by bogglez »

PH3NOM wrote:Yes, I understand. And thank you for your interest!

That is the motivation for me deciding to add support for indexed arrays by implementing glDrawElements(...):
We can light and transform less vertices before assembling into primitives for rasterization.
The pleasure is mine.

By the way, reading those old threads I noticed some things about GL usage in the code that people write on here:

- Triangle strip optimization: Only really makes sense with unindexed data. With a vertex cache it will only give you a tiny performance improvement, while being much more bothersome to use in many ways.

- OpenGL matrix functions and the matrix stack:
Don't use those at all. Instead, define a tree structure for the transforms with each child storing the local transform and the absolute transform including all parent transforms. Changing a parent's transform should then recalculate the transform matrix of each child. This will save you a lot of matrix calculations and stack operations. Matrix calculations will also not be interleaved with drawing operations, so you will make better use of registers and the memory cache.
None of this is done inside of your OpenGL library, but instead in the code of its users.
I just want to point out that those functions are not of importance in your library since they shouldn't be used anyway, and you may want to remove them from the example code, so people new to 3D don't even start using those functions (even though your intent is just to use them for a quick demo).
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Jae686
Insane DCEmu
Insane DCEmu
Posts: 112
Joined: Sat Sep 22, 2007 9:43 pm
Location: Braga - Portugal
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by Jae686 »

boggelz , by OpenGL matrix functions you mean glTranslate , glRotate and glScale ?
Why should those be avoided ?

Do you have an example of how the transform tree should be implemented ?

Best Regards
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by bogglez »

Jae686 wrote:boggelz , by OpenGL matrix functions you mean glTranslate , glRotate and glScale ?
Why should those be avoided ?

Do you have an example of how the transform tree should be implemented ?
The glTranslate, Rotate etc functions always implicitly work on a global matrix stack. You're constantly pushing and popping matrices that you could actually use again for the same object in the next frame, which would save you some matrix multiplications 30 times per second.
What I'm referring to is often called a "scene graph" (the term is very loosely defined and some people really go over board with it).

In its basic, sensible form, it just expresses a tree hierarchy of transformations in the scene, each node containing a local and an absolute transform.
For example the root node of the scene graph may be a ship. On the ship there's the captain and a cannon. So the root node gets those two as child nodes. The captain is also wearing a hat, so we need a transformation matrix from the position of the captain to his head in order to place the hat properly, so the captain has the hat as his child node.
To rotate the hat of the captain you just need to change its local transform. The absolute transform of the hat is now outdated, so we multiply the hat's local transform with the captain's absolute transform and we're done.
If the captain moves (his local transform changes), the hat also moves. So we need to update the captain's and hat's absolute transforms.
During all of this the matrices of the cannon and ship were unaffected.
When you draw a frame you walk the scene graph first and check whether a node is "dirty" as described above. If it is, you update the transforms for it and its children. Now when you want to draw any object you just load its absolute transform and start drawing.
You should see how this saves you an incredible amount of matrix transforms for even simple scenes. A basic graphics engine will also perform visibility detection (frustum culling etc), which is easier to do with the scene graph (you need the transform but don't care about parent nodes that aren't visible).

Aside from that you can write special matrix functions for complicated transformations that you commonly use. For example you don't always need a full 4x4 * 4 multiply. Sometimes you don't care about the w component, but you want the translation. Since you save a whole dot product (one row * col) which I think the DC supports in hardware, I think you could multiply more matrices on the DC this way too.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by PH3NOM »

Over the weekend, I re-wrote the arrays submission in my Open GL API, including the clipping and lighting mechanism on arrays.
And after tight benchmarking, I also removed the function pointer system I was using before, and replaced that with better optimized pipelined loops.
When clipping is enabled, I have managed to save an entire transform per-vertex, compared to before, by preserving the w component and delaying perspective division until after the clipping stage.

Just as a test, I have run a quick sample of using glDrawArrays(...) with my Open GL API and Quake 3 BSP's.

Since the .bsp is in the romdisk, I cut out the actual textures, but the polygons are in fact textured.

In this demo, I am submitting every single face of the bsp without using the PVS system, and I am clipping every single vertex, and still we are sailing at 60fps with time to spare :-)
dc_opengl_q3bsp_a01.rar
Open GL DC Quake 3 BSP DEMO (C) 2014 PH3NOM
(1.2 MiB) Downloaded 69 times
Image
Jae686
Insane DCEmu
Insane DCEmu
Posts: 112
Joined: Sat Sep 22, 2007 9:43 pm
Location: Braga - Portugal
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by Jae686 »

PVS System ? What's a PVS system ?
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by bogglez »

Jae686 wrote:PVS System ? What's a PVS system ?
http://en.wikipedia.org/wiki/Potentially_visible_set

BSP has a very efficient data structure to reduce the amount of polygons you need to render depending on the view point. If ph3nom were to implement the visibility tests, he would be able to render much bigger and more detailed environments. Right now he probably just renders the whole level, maybe with some frustum culling only.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by PH3NOM »

Yes, the BSP data structure is quite efficient for its time indeed.

The PVS system is a form of occlusion culling. It determines the "Potentially Visible Surfaces" based on the "camera" position.
It does not take into account "camera" direction, so it is not quite parallax occlusion.

The BSP also includes bounding boxes for the leaf faces, so you can perform frustum culling on top of the PVS occlusion.

I have implemented the PVS system, and a first pass at nearz frustum culling using the BSP bounding boxes, so at least polygons behind the camera will not be submitted.
Spoiler!

Code: Select all

#include <dc/vec3f.h>

#define POINT_ON_PLANE          0x0
#define POINT_IN_FRONT_OF_PLANE 0x1
#define POINT_BEHIND_PLANE      0x2

byte Q3BSP_PlaneClassifyPoint(Q3_BSP_PLANE *plane, vector3f *point)
{
	float d;

	vec3f_dot(point->x, point->y, point->z, plane->normal.x, plane->normal.y, plane->normal.z, d);

	d += plane->intercept;

	if(d > 0)
		return POINT_IN_FRONT_OF_PLANE;
	else if(d < 0)
		return POINT_BEHIND_PLANE;

	return POINT_ON_PLANE;	
}

int Q3BSP_CalculateCameraLeaf(vector3f *camFrom)
{
	int node = 0;
	
	while(node >= 0)
		if(Q3BSP_PlaneClassifyPoint(&BSP_PLANES[BSP_NODES[node].planeIndex], camFrom) == POINT_IN_FRONT_OF_PLANE)
			node = BSP_NODES[node].front;
		else
			node = BSP_NODES[node].back;

	return ~node;
}

//See if one cluster is visible from another
byte Q3BSP_ClusterIsVisible(int pos, int test)
{
	return (BSP_VIS->bitset[(pos * BSP_VIS->bytesPerCluster) + (test >> 3)] & (1 << (test & 7))) != 0;
}

byte Q3BSP_LeafIsVisible(Q3_BSP_LEAF *leaf, vector3f *cam, vector3f *cv)
{
	float dot;
	byte bbox_in = 0;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->mins[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->mins[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->mins[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->mins[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->maxs[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->maxs[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->maxs[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->maxs[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	return bbox_in;
}

//Calculate which faces to draw given a position & camera frustum
void Q3BSP_CalculateVisibleFaces(vector3f *camera, vector3f *camto)
{
	//Clear the list of faces drawn
	Q3BSP_ClearVisData();

	//calculate the camera leaf
	int cameraLeaf = Q3BSP_CalculateCameraLeaf(camera);

	int cameraCluster = BSP_LEAF[cameraLeaf].cluster;

	vector3f cv = { camera->x - camto->x, camera->y - camto->y, camera->z - camto->z };

	//loop through the leaves
	int i, j, l = Q3BSP_Leaves();
	unsigned char bbox_in;
	for(i = 0; i < l; ++i)
	{
		//if the leaf is not in the PVS, continue
		if(!Q3BSP_ClusterIsVisible(cameraCluster, BSP_LEAF[i].cluster))
			continue;

		bbox_in = Q3BSP_LeafIsVisible(&BSP_LEAF[i], camera, &cv.x);

		if(!bbox_in) /* CULL Faces in this Leaf */
			continue;

		//loop through faces in this leaf and mark them to be drawn
		if(bbox_in != 8) /* Clip Faces in this Leaf */
		{
		    for(j = 0; j < BSP_LEAF[i].numLeafFaces; ++j)
			{
			    BSP_FACE_VIS[BSP_LEAF_FACE[BSP_LEAF[i].firstLeafFace+j]] = 1;
				BSP_FACE_CLIP[BSP_LEAF_FACE[BSP_LEAF[i].firstLeafFace+j]] = 1;
			}
		}
		else /* No Culling or clipping - face is completely inside z-plane */
		{
		    for(j = 0; j < BSP_LEAF[i].numLeafFaces; ++j)
			    BSP_FACE_VIS[BSP_LEAF_FACE[BSP_LEAF[i].firstLeafFace+j]] = 1;
		}
	}
}
But I have disabled that for testing the raw vertex throughput of glDrawArrays vs glDrawElements.

Strangely enough, glDrawArrays is actually faster.
Testing an even larger BSP; Note the CPU time here using glDrawArrays():
Image

Now, look at the CPU time here using glDrawElements():
Image

As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by bogglez »

PH3NOM wrote:
Spoiler!

Code: Select all

byte Q3BSP_LeafIsVisible(Q3_BSP_LEAF *leaf, vector3f *cam, vector3f *cv)
{
	float dot;
	byte bbox_in = 0;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->mins[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->mins[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->mins[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->mins[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->maxs[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->maxs[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->maxs[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->maxs[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	return bbox_in;
}
I was wondering whether you can replace those 8 dot products with two matrix multiplications to gain some speed. Put the vectors on the right side into the matrix and multiply it by cv, then do the if checks after the matrix multiplication (or just write "bbox_in = result[0] > 0 + result[1] > 0 ...). That should take 1/4th the time + some overhead to set up the matrix. It may also improve on the branching. Maybe that's faster?
Strangely enough, glDrawArrays is actually faster.

As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
That's really hard to comment on without the implementation of glDrawArrays and glDrawElements. Anyway, I think you should be able to release your GL API now! There are some people on this forum waiting for your code for their own projects, should be inspiring :-)
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Quake 3 lightmaps - PVR Multi-Texture

Post by PH3NOM »

bogglez wrote:
PH3NOM wrote:
Spoiler!

Code: Select all

byte Q3BSP_LeafIsVisible(Q3_BSP_LEAF *leaf, vector3f *cam, vector3f *cv)
{
	float dot;
	byte bbox_in = 0;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->mins[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->mins[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->mins[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->mins[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->maxs[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->mins[0], cam->y - leaf->maxs[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->maxs[1], cam->z - leaf->mins[2], dot);
	if(dot > 0) ++bbox_in;

	vec3f_dot(cv->x, cv->y, cv->z, cam->x - leaf->maxs[0], cam->y - leaf->maxs[1], cam->z - leaf->maxs[2], dot);
	if(dot > 0) ++bbox_in;

	return bbox_in;
}
I was wondering whether you can replace those 8 dot products with two matrix multiplications to gain some speed. Put the vectors on the right side into the matrix and multiply it by cv, then do the if checks after the matrix multiplication (or just write "bbox_in = result[0] > 0 + result[1] > 0 ...). That should take 1/4th the time + some overhead to set up the matrix. It may also improve on the branching. Maybe that's faster?
Strangely enough, glDrawArrays is actually faster.

As I have now pretty tightly optimized things in both cases, I can only guess that unpacking each attribute for each vertex each frame costs more time than simply unpacking first, and then submitting as arrays on the DC...
That's really hard to comment on without the implementation of glDrawArrays and glDrawElements. Anyway, I think you should be able to release your GL API now! There are some people on this forum waiting for your code for their own projects, should be inspiring :-)
Thank you for the encouragement. 8-)

I have not even attempted to optimize the bounding box zculling, I just wrote that function quite very quickly.
However, a matrix transform is not nearly as fast as a dot product, so the 1/4 time is not quite right.
I think I benchmarked at least 24mil dot operations per second, and ~16mil matrix transforms per second.
And that was not reloading the transform matrix registers each operation, that costs ~11 cycles per operation, so doing that would obviously slow things down further.
And, it seems to be best to use the extended register bank ( matrix register ) only if you are using it to transform multiple vectors.
For the bounding box algorithm, each matrix would only transform 1 vector.

Anywhoo, I decided to disable texturing to see the geometry better.
Using glDrawElements(): ~43fps @ 1.6mil verts/sec.
Image

Using glDrawArrays()(after unpacking the geometry into arrays): 60fps @ 2.3mil verts/sec
Image
Post Reply