mat_transform / pvr_prim vs mat_transform_sq

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Hi,

I am working on a Blender exporter and a viewer on the DC side. My goal is to export simple meshes to begin with, then to add support for bones and animation. I know there is already an exporter in KOS source code, but it looks like it does not handle triangle strips at all and generates a large amount of independent faces.
I am currently working on the generation of triangle strips to feed the Dreamcast (which is not a standard feature of Blender?!).
Doing that, I have gone through the KOS include files and matrix.s and I have a question. If anyone has answers, I might save a lot of testing time.
The question is, to summarize: is it longer to compute matrix transforms or to submit vertices to the PVR?
- Solution one is to compute all vertex coordinates using mat_transform (avoiding cache trashing problems), then to submit strips based on the computations using pvr_prim calls (doing access to the RAM in a non sequencial way).
- Solution two is to compute strips and submit them immediately through store queues using mat_transform_sq. It looks more efficient, but requires to compute multiple times the same vertices when strips have vertices in common.

If you have any ideas, feel free to share.

Thanks in advance!!
Ayla
DC Developer
DC Developer
Posts: 142
Joined: Thu Apr 03, 2008 7:01 am
Has thanked: 0
Been thanked: 4 times
Contact:

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Ayla »

Why don't you pre-process the 3D models before loading them on DC?
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Pre-process to do what exactly? The role of the exporter is to do as much pre-processing as possible (compute strips). The question is still: is it better to process continuous strips in RAM if it forces to compute multiple times the same vertices...
What exactly do you have in mind?
Ayla
DC Developer
DC Developer
Posts: 142
Joined: Thu Apr 03, 2008 7:01 am
Has thanked: 0
Been thanked: 4 times
Contact:

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Ayla »

For some reason I thought you were about to create the triangle strips on the DC itself. Forget about me :-)
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Oh no, the exporter should do all the work and produce structures ready to process by the PVR. Fast to load, fast to display (I hope so).
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by PH3NOM »

Bouz wrote:Hi,

The question is, to summarize: is it longer to compute matrix transforms or to submit vertices to the PVR?
- Solution one is to compute all vertex coordinates using mat_transform (avoiding cache trashing problems), then to submit strips based on the computations using pvr_prim calls (doing access to the RAM in a non sequencial way).
- Solution two is to compute strips and submit them immediately through store queues using mat_transform_sq. It looks more efficient, but requires to compute multiple times the same vertices when strips have vertices in common.

If you have any ideas, feel free to share.

Thanks in advance!!
I have worked in 2D before, and the PVR was very fast at receiving vertices. The engine I was working on pushed over 180,000 textured, rotated QUADS/second with Game Logic on top of Render.

Recently I have worked in 3D using KGLX. Matrix Transforms seem quite slow compared, as the Render Engine I am working on seems to max around 30,000 textured, rotated, blended, scaled, and transformed (QUADS or TRIS)/second.

I am not sure if it helps, but here is the function I have written to Render a Poly dependent on the Camera ViewPoint.

How things look in my code currently ( any suggestions appreciated ):

Code: Select all

inline DCE_GlRenderObj( DCE_OBJ * obj, vector4f campos, vector4f camdst )
{
   
    /* Set up basic GL render stack */   
    glLoadIdentity();
    gluLookAt( campos[x], campos[y], campos[z],          /* View-Point Source */
               camdst[x], camdst[y], camdst[z],     /* View-Point Destination */
               0,         1,         0          );             /* "Up Vector" */
    
    /* Apply Screen-Space Transformations */
    glTranslatef(obj->pos[x],obj->pos[y],obj->pos[z]);

    /* Apply Object Rotations */
	glRotatef(obj->rot[x],1.0f,0.0f,0.0f);
	glRotatef(obj->rot[y],0.0f,1.0f,0.0f);
	glRotatef(obj->rot[z],0.0f,0.0f,1.0f);

    /* Apply Object Scaling */
	glScalef(obj->scale,obj->scale,obj->scale);
 
    /* Apply Texture Mapping */
	glBindTexture(GL_TEXTURE_2D, (GLuint)obj->txaddr);
    
    /* Throw the Object Matrix into the GL Pipeline */
    if(obj->primitive==DCE_RENDER_QUAD)
	{
       glBegin(GL_QUADS);
	   glTexCoord2f(obj->uend,  obj->vend); glVertex3fv( obj->mat );
       glTexCoord2f(obj->ust, obj->vend);   glVertex3fv( obj->mat+1 );
       glTexCoord2f(obj->ust, obj->vst);    glVertex3fv( obj->mat+2 );
       glTexCoord2f(obj->uend,  obj->vst);  glVertex3fv( obj->mat+3 );
    }
    else
    {   
       glBegin(GL_TRIANGLES);
	   //glColor3f(1.0,0.0,0.0);
	   glTexCoord2f(obj->ust,  obj->vst);  glVertex3fv( obj->mat );
	   //glColor3f(0,1.0,0.0);
       glTexCoord2f(obj->uend/2.0f, obj->vend/2.0f);  glVertex3fv( obj->mat+1 );
	   //glColor3f(0,0.0,1.0);
       glTexCoord2f(obj->ust, obj->vend); glVertex3fv( obj->mat+2 );    
    }
	glEnd();
}
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Hi Ph3NOM, thanks for answering.
I have also worked on a 3D engine, based on KGLX a few years ago. You can find a little demo attached (a racing game).
A good way I found to know how much time I had to draw more polygons was to change the screen background color (the bars you can see around the screen).
All my 3D objects are hand made, so strips are quite optimized, and it looks like most of the games that run on the Dreamcast display much more textured and transformed polygons than I do, so it looks like KGL is not the most optimal way to submit polygons ;-)
This is why I am trying to explore lower level ways of submitting polys.
Your code looks OK, I don't think there is a better way to submit vertices to KGL. Considering Store queues and memory cache optimization I could see in the PVR API code comments, it looks like KGL can't be as efficient as the core API, even if you submit your vertices in a very optimal way.
Even if the PVR API can be very efficient, I still need to look into a few aspects:
- The 3D modelers I have seen so far don't export optimized strips. This forces us to submit independent triangles to transform and to the PVR with many redundant vertices. I am working on an exporter for Blender to solve that problem.
- Test performances of prv_prim and mat_transform_sq to determine if it is better to transform all vertices only once, then submit them in a cache unfriendly order, or compute vertices multiple times and submit them using the store queues.

Just as a quick note, the sound in the game is produced by the AICA driver I talked about a few month (years?) ago on this forum, but that is another discussion ;-)

Thanks again!
Attachments
turbolz.elf.gz
Demo of my KGL based racing game
(2.44 MiB) Downloaded 167 times
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Hi again,

I had a closer look at your code, and I am not sure I understand why exactly you are doing things like this. Is your engine based on sprites? If you need to render meshes (more than one triangle / quad), you might prefer this approach:
- Set up basic GL render stack. This should be done only once or twice, as it computes the rendering matrix
- Apply Screen-Space Transformations. Same as above
- Apply Object Rotations. Do this only once per object, as, once again, it computes the rendering matrix
- Apply Object Scaling. Same comment
- Apply Texture Mapping. Only when the texture changes
- Submit any triangles / quads you can with the current render matrix and texture configuration
In KGL, there is a method that allows you to push the current matrix and pop it back. You can use that system to store the matrix after "Set up basic GL render stack" and "Apply Screen-Space Transformations" to avoid recomputing it every time you want to render a poly.

I hope this helps. Even if you only render sprites, the Push / Pop system can help a lot with performance.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by PH3NOM »

Yeah actually that helped me realize when I need to push and pop the render matrix.
I only need to apply the camera perspective to the Render Matrix before rendering every Poly. Thanks for that.
Also, Rotations can be applied to the objects upon creation ( per-processing ).

And right that is not really optimized for meshes, simple geometry only, currently.

Code: Select all

inline DCE_GlRenderObj( DCE_OBJ * obj )
{
    /* Push the Render Matrix onto the Stack */   
    glPushMatrix();

    /* Apply Screen-Space Transformations */
    glTranslatef(obj->pos[x],obj->pos[y],obj->pos[z]);

    /* Apply Object Scaling */
	glScalef(obj->scale,obj->scale,obj->scale);
 
    /* Apply Texture Mapping */
	glBindTexture(GL_TEXTURE_2D, (GLuint)obj->txaddr);
    
    /* Throw the Object Matrix into the GL Pipeline */
    if(obj->primitive==DCE_RENDER_QUAD)
	{
       glBegin(GL_QUADS);
	   glTexCoord2f(obj->uend,  obj->vend); glVertex3fv( obj->mat );
       glTexCoord2f(obj->ust, obj->vend);   glVertex3fv( obj->mat+1 );
       glTexCoord2f(obj->ust, obj->vst);    glVertex3fv( obj->mat+2 );
       glTexCoord2f(obj->uend,  obj->vst);  glVertex3fv( obj->mat+3 );
    }
    else
    {   
       glBegin(GL_TRIANGLES);
	   glTexCoord2f(obj->ust,  obj->vst);  glVertex3fv( obj->mat );
       glTexCoord2f(obj->uend/2.0f, obj->vend/2.0f);  glVertex3fv( obj->mat+1 );
       glTexCoord2f(obj->ust, obj->vend); glVertex3fv( obj->mat+2 );    
    }
	glEnd();
	
    /* Pop the Render Matrix off the Stack */  
    glPopMatrix();
}
Image

Attached is a demo. BGM is done using LibS3MPlay that I uploaded on the forums some time ago :P
Attachments
dc-engine-3d.rar
(1.68 MiB) Downloaded 157 times
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by PH3NOM »

Bouz wrote:Oh no, the exporter should do all the work and produce structures ready to process by the PVR. Fast to load, fast to display (I hope so).
I am curious if you have had any success?

Since this post, I have written an exporter to convert 3ds objects into a mesh format optimized for use with glDrawArrays().

Here is the structure I have created for manipulating meshes:

Code: Select all

typedef struct
{
    FOURCC sig;         // Signature = "DCE "
    FOURCC type;        // File Type = "MESH"
    DWORD verts;        // Number of Vertices
    DWORD primitive;    // Primitive type: 3=Triangles 4=Quads
    FLOAT scale;        // Mesh Scale
    vector4f pos;       // Mesh Position ( x, y, z, null )
    vector4f rot;       // Mesh Rotation ( x, y, z, null )
    CHAR texname[116];  // Mesh Texture Name
    DWORD * texaddr;    // Pointer to Texture Address in VRAM (Set by Engine)
    FLOAT * texcoord;   // Array of Texture u/v coordinates
    FLOAT * vert;       // Array of Vertex Data
} DCE_MESH;
To export 3DS models into my structure, I simply unpacked all of the vertex data and texture coordinates, so that they can be traversed in a linear fashion without unpacking indices at render

Code: Select all

void make_mesh(3DS_OBJ * object, DCE_MESH * mesh)
{
    DWORD i;
    FLOAT *ptr;
    
    ptr = mesh->texcoord;
    
    for (i=0;i<object->polygons_qty;i++)
    {
        *ptr++ = object->mapcoord[ object->polygon[i].a ].u;
        *ptr++ = object->mapcoord[ object->polygon[i].a ].v;
        *ptr++ = object->mapcoord[ object->polygon[i].b ].u;
        *ptr++ = object->mapcoord[ object->polygon[i].b ].v;
        *ptr++ = object->mapcoord[ object->polygon[i].c ].u;
        *ptr++ = object->mapcoord[ object->polygon[i].c ].v;
    }
    
    ptr = mesh->vert;
    
    for (i=0;i<object->polygons_qty;i++)
    {
        *ptr++ = object->vertex[ object->polygon[i].a ].x;
        *ptr++ = object->vertex[ object->polygon[i].a ].y;
        *ptr++ = object->vertex[ object->polygon[i].a ].z;
        *ptr++ = object->vertex[ object->polygon[i].b ].x;
        *ptr++ = object->vertex[ object->polygon[i].b ].y;
        *ptr++ = object->vertex[ object->polygon[i].b ].z;
        *ptr++ = object->vertex[ object->polygon[i].c ].x;
        *ptr++ = object->vertex[ object->polygon[i].c ].y;
        *ptr++ = object->vertex[ object->polygon[i].c ].z;
    }

}
That part is done on PC, although it started out being done on DC :-)

Once the object is loaded into ram, it can be rendered with this function call:

Code: Select all

void DCE_GlRenderArray( DCE_MESH * mesh )
{    
    /* Push the Render Matrix onto the Stack */   
    glPushMatrix();
    
    glTranslatef(mesh->pos[x],mesh->pos[y],mesh->pos[z]);

	glRotatef(mesh->rot[x],1.0f,0.0f,0.0f);
	glRotatef(mesh->rot[y],0.0f,1.0f,0.0f);
	glRotatef(mesh->rot[z],0.0f,0.0f,1.0f);
	
    glScalef(mesh->scale,mesh->scale,mesh->scale); 
     
    glBindTexture(GL_TEXTURE_2D, mesh->texaddr);
    
    glVertexPointer(3, GL_FLOAT, 0, mesh->vert);    
    
    glTexCoordPointer(2, GL_FLOAT, 0, mesh->texcoord);
    
    if(mesh->primitive==DCE_RENDER_QUAD)
       glDrawArrays(GL_QUADS, 0, mesh->verts*mesh->primitive );
    else
       glDrawArrays(GL_TRIANGLES, 0, mesh->verts*mesh->primitive );      

    /* Pop the Render Matrix off the Stack */  
    glPopMatrix();
}
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Hi Ph3nom,

Thanks for the code! No big progress on my side. The last days, I have learnt Python and the Blender API.
I still have to:
- write the stripping algorithm (Blender / Python)
- write the binary exporter (Blender / Python)
- write the Dreamcast code (seemed to be the hardest part a few month ago, this is the easiest today!)

My target is still performance, so the strips should be part of the final data structure (still this strip obsession). From what I have learnt, it should be possible to have multiple textures per mesh and handle bones (but this is not in my list for today!).

I would be really happy to find the code of this glDrawArrays() function!
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

It might take longer than expected, as I have just killed my Archlinux install while trying to update Blender. Linux rulezzz :-(
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by PH3NOM »

Bouz wrote:Hi Ph3nom,

Thanks for the code! No big progress on my side. The last days, I have learnt Python and the Blender API.
I still have to:
- write the stripping algorithm (Blender / Python)
- write the binary exporter (Blender / Python)
- write the Dreamcast code (seemed to be the hardest part a few month ago, this is the easiest today!)
Sounds like fun, good luck with your mesh->strip algorithm! If done right that could be very valuable.
As a starting point, maybe you can start by exporting as vertex arrays first, as I have. After you get the basic implementation working, then advance to the strip optimization. Divide and conquer!
Bouz wrote:My target is still performance, so the strips should be part of the final data structure (still this strip obsession).
I would be really happy to find the code of this glDrawArrays() function!
Yes to my mind the fastest possible render method ( with GL ) should be to use glDrawArrays(GL_TRIANGLE_STRIP, 0, numOfVerts );
http://www.opengl.org/sdk/docs/man/xhtm ... Arrays.xml
Bouz wrote:From what I have learnt, it should be possible to have multiple textures per mesh and handle bones (but this is not in my list for today!).
About textures, do you mean multiple textures per face(single poly), or multiple textures per mesh(group of polys)?

I can think of more than one way to handle multiple textures per mesh on DC, using glDrawArrays().
Primarily, this is achieved by arranging the polygons to be rendered by texture id, or face.
Once arranged, a mesh can be rendered with multiple textures by making a separate render call for each group of faces that share the same texture.

What I can not realize is how to handle multiple textures per face on DC, using glDrawArrays().
Typically, multi-texturing is accomplished with glAcitveTexture()
http://www.java-gaming.org/topics/multi ... #msg189204
Problem is that glActiveTexture() is not supported in the DC's build of GL
TapamN
DC Developer
DC Developer
Posts: 105
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 90 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by TapamN »

You get pretty easily get a decent strip generator running in Blender with this:

http://www.executionunit.com/en/blog/20 ... phoneipad/

The download has a sample Blender script for sending data from Blender, to the stripifier, and receiving the results. It only took minor changes to get it working on my system.

The strip generator was made by nVidia, and is written in C++, which its source available. It's designed to generate strips for systems with built-in T&L caches that can recognize when the same vertex reused to avoid recalculating the vertex. The Dreamcast doesn't have any dedicated T&L processors that do this, so the strips generated aren't the best they could be for the DC, but what it generates is far better than nothing.

One thing of note is that the strip generator like to output degenerate polygons to change winding order. The PVR does not like getting degenerate polygons when culling is turned off, and will spit out horizontal lines across the tile the degenerate is on. Turning on any culling (CW, CCW, small) will fix it. The images attached show what it looks like.

Also, be sure you don't accidentally pass invalid parameters from Blender to the strip generator program, otherwise it hangs. Also, set the -cs parameter to something like 100 to get the program to output longer strips.
Attachments
yes cull.png
yes cull.png (47.63 KiB) Viewed 4516 times
no cull.png
no cull.png (66.24 KiB) Viewed 4516 times
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by PH3NOM »

TapamN - Thanks for the info, it seems Bouz has the hard part already done for him!

Are the screens you posted running on DC? If so, would you mind uploading the binary to have a look?

Also, what license is that source code released under?
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

PH3NOM wrote:Sounds like fun, good luck with your mesh->strip algorithm! If done right that could be very valuable.
As a starting point, maybe you can start by exporting as vertex arrays first, as I have. After you get the basic implementation working, then advance to the strip optimization. Divide and conquer!
Well, in fact I will start with a basic strip algorithm, and make the full pipe work. Then I will work on the algo again to make it better.
PH3NOM wrote:Yes to my mind the fastest possible render method ( with GL ) should be to use glDrawArrays(GL_TRIANGLE_STRIP, 0, numOfVerts );
http://www.opengl.org/sdk/docs/man/xhtm ... Arrays.xml
This might be true for recent 3D cards, that handle arrays of vertices and indexes, but this is probably not the case for the PVR, that only handles triangle strips. This is why I could like to have a look at the source code of this glDrawArrays function!
PH3NOM wrote:About textures, do you mean multiple textures per face(single poly), or multiple textures per mesh(group of polys)?

I can think of more than one way to handle multiple textures per mesh on DC, using glDrawArrays().
Primarily, this is achieved by arranging the polygons to be rendered by texture id, or face.
Once arranged, a mesh can be rendered with multiple textures by making a separate render call for each group of faces that share the same texture.

What I can not realize is how to handle multiple textures per face on DC, using glDrawArrays().
Typically, multi-texturing is accomplished with glAcitveTexture()
http://www.java-gaming.org/topics/multi ... #msg189204
Problem is that glActiveTexture() is not supported in the DC's build of GL
Well, I meant multiple textures per mesh, but it is really not a priority. I don't think is is possible to have multiple texture for one triangle on the PVR (this is probably why glActiveTexture is not part of the DC's build of GL).

My Archlinux machine is back, yeeehaaa..
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

TapamN wrote:You get pretty easily get a decent strip generator running in Blender with this:

http://www.executionunit.com/en/blog/20 ... phoneipad/

The download has a sample Blender script for sending data from Blender, to the stripifier, and receiving the results. It only took minor changes to get it working on my system.

The strip generator was made by nVidia, and is written in C++, which its source available. It's designed to generate strips for systems with built-in T&L caches that can recognize when the same vertex reused to avoid recalculating the vertex. The Dreamcast doesn't have any dedicated T&L processors that do this, so the strips generated aren't the best they could be for the DC, but what it generates is far better than nothing.

One thing of note is that the strip generator like to output degenerate polygons to change winding order. The PVR does not like getting degenerate polygons when culling is turned off, and will spit out horizontal lines across the tile the degenerate is on. Turning on any culling (CW, CCW, small) will fix it. The images attached show what it looks like.

Also, be sure you don't accidentally pass invalid parameters from Blender to the strip generator program, otherwise it hangs. Also, set the -cs parameter to something like 100 to get the program to output longer strips.
Hi TapamN, thanks for the info! I found references to this Nvidia tool, but never found the tool itself. Apparently, the link to the Nvidia dev site is not valid.
Anyway, this is not a big problem as I find really interesting to produce a stripifier!
User avatar
T_chan
DC Developer
DC Developer
Posts: 32
Joined: Mon Aug 22, 2011 12:45 pm
Has thanked: 12 times
Been thanked: 22 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by T_chan »

You might want to have a look at this one: http://users.telenet.be/tfautre/softdev/tristripper/
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by PH3NOM »

Bouz wrote:Well, in fact I will start with a basic strip algorithm, and make the full pipe work. Then I will work on the algo again to make it better.
Again, good luck! :P
Bouz wrote:
PH3NOM wrote:Yes to my mind the fastest possible render method ( with GL ) should be to use glDrawArrays(GL_TRIANGLE_STRIP, 0, numOfVerts );
http://www.opengl.org/sdk/docs/man/xhtm ... Arrays.xml
This might be true for recent 3D cards, that handle arrays of vertices and indexes, but this is probably not the case for the PVR, that only handles triangle strips. This is why I could like to have a look at the source code of this glDrawArrays function!
The motivation behind glDrawArrays() is simple; eliminate overhead from function calls in the rendering routine. Use less CPU time for render.

Every time a function is called, the machine must push all of the needed parameters onto the system stack, as well as the address of where to 'jump' to execute the function, and then where to return after the function is done. If the parameters are call-by-refernce, we only need to push the address of the variable. Even worse, if the variables are call-by-value, we have to make a copy of the values.

So, lets say we decide not to use glDrawArrays(), instead glVertex3fv().
Every triangle ( or quad ) makes a call to glVertex3fv()
( KGLX implementation )

Code: Select all

void glVertex3fv(GLfloat *v) {
    glVertex4f(v[0], v[1], v[2], 1.0f);
}

void glVertex4f(GLfloat x, GLfloat y, GLfloat z,GLfloat w) {
    GLParam p[5];

    p[0].op=OP_Vertex;
    p[1].f=x;
    p[2].f=y;
    p[3].f=z;
    p[4].f=w;

    gl_add_op(p);
}
Here, we push the address of the function, and the address to return to, and also the address of the vertex vector ( 3 pushes ).
Next, we make a call to glVertex4f, pushing the address of the function, and the address to return to, and since glVertex4f() uses call-by-value, we also must create a copy of all 4 parameters to the function call ( 7 pushes, 4 copies ).
When this is being done thousands of times per frame, every little bit makes a difference.

Here is glDrawArrays() (glapi.c) that I have optimized from the version in KGLX

Code: Select all

void glDrawArrays( GLenum mode, GLint first, GLsizei count )
{
    unsigned int n=first+count;
    GLParam p[2];

    p[0].op=OP_Begin; /* GlBegin() */
    p[1].i=mode;
    gl_add_op(p);
    
    --first; /* SH4 pre-increment */

    p[0].op = OP_ArrayElement; /* GlDrawArrayElement() */
    while(first<n)
    {
        p[1].i = ++first;
        gl_add_op(p);
    }

    p[0].op=OP_End; /* GlEnd() */
    gl_add_op(p);
}
but I think what you are wanting to see is on gloparray.c
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has thanked: 0
Been thanked: 0

Re: mat_transform / pvr_prim vs mat_transform_sq

Post by Bouz »

Ph3nom: thanks for the info, now I know where the this function comes from (KGLX). I have downloaded the 0.2 version. I have not gone deep into the code to know how the array is processed to produce triangles for the PVR API, bu tI can already saw that the function gl_add_op itself is calling lots of functions.
When I speak of triangle strips, I don't want to use KGL, but directly the PVR and the function mat_transform_sq(). The mesh is loaded in memory, and mat_transform_sq() computes vertex transform and directly transfers to the PVRthrough the store queue, so I think it should be much faster than anything running under KGLX.
But once again, we will be sure once my system is complete ;-)

T_chan: thanks a lof for this URL, the site is full of interesting info! It will help a lot. Of course, the PVR does not have any vertex cache, so not everything applies, but it is a really good page!

Thanks again!
Post Reply