mat_transform / pvr_prim vs mat_transform_sq
- Bouz
- DCEmu Junior
- Posts: 46
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
mat_transform / pvr_prim vs mat_transform_sq
Hi,
I am working on a Blender exporter and a viewer on the DC side. My goal is to export simple meshes to begin with, then to add support for bones and animation. I know there is already an exporter in KOS source code, but it looks like it does not handle triangle strips at all and generates a large amount of independent faces.
I am currently working on the generation of triangle strips to feed the Dreamcast (which is not a standard feature of Blender?!).
Doing that, I have gone through the KOS include files and matrix.s and I have a question. If anyone has answers, I might save a lot of testing time.
The question is, to summarize: is it longer to compute matrix transforms or to submit vertices to the PVR?
- Solution one is to compute all vertex coordinates using mat_transform (avoiding cache trashing problems), then to submit strips based on the computations using pvr_prim calls (doing access to the RAM in a non sequencial way).
- Solution two is to compute strips and submit them immediately through store queues using mat_transform_sq. It looks more efficient, but requires to compute multiple times the same vertices when strips have vertices in common.
If you have any ideas, feel free to share.
Thanks in advance!!
I am working on a Blender exporter and a viewer on the DC side. My goal is to export simple meshes to begin with, then to add support for bones and animation. I know there is already an exporter in KOS source code, but it looks like it does not handle triangle strips at all and generates a large amount of independent faces.
I am currently working on the generation of triangle strips to feed the Dreamcast (which is not a standard feature of Blender?!).
Doing that, I have gone through the KOS include files and matrix.s and I have a question. If anyone has answers, I might save a lot of testing time.
The question is, to summarize: is it longer to compute matrix transforms or to submit vertices to the PVR?
- Solution one is to compute all vertex coordinates using mat_transform (avoiding cache trashing problems), then to submit strips based on the computations using pvr_prim calls (doing access to the RAM in a non sequencial way).
- Solution two is to compute strips and submit them immediately through store queues using mat_transform_sq. It looks more efficient, but requires to compute multiple times the same vertices when strips have vertices in common.
If you have any ideas, feel free to share.
Thanks in advance!!
-
- DC Developer
- Posts: 142
- Joined: Thu Apr 03, 2008 7:01 am
- Has thanked: 0
- Been thanked: 4 times
- Contact:
Re: mat_transform / pvr_prim vs mat_transform_sq
Why don't you pre-process the 3D models before loading them on DC?
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Pre-process to do what exactly? The role of the exporter is to do as much pre-processing as possible (compute strips). The question is still: is it better to process continuous strips in RAM if it forces to compute multiple times the same vertices...
What exactly do you have in mind?
What exactly do you have in mind?
-
- DC Developer
- Posts: 142
- Joined: Thu Apr 03, 2008 7:01 am
- Has thanked: 0
- Been thanked: 4 times
- Contact:
Re: mat_transform / pvr_prim vs mat_transform_sq
For some reason I thought you were about to create the triangle strips on the DC itself. Forget about me
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Oh no, the exporter should do all the work and produce structures ready to process by the PVR. Fast to load, fast to display (I hope so).
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: mat_transform / pvr_prim vs mat_transform_sq
I have worked in 2D before, and the PVR was very fast at receiving vertices. The engine I was working on pushed over 180,000 textured, rotated QUADS/second with Game Logic on top of Render.Bouz wrote:Hi,
The question is, to summarize: is it longer to compute matrix transforms or to submit vertices to the PVR?
- Solution one is to compute all vertex coordinates using mat_transform (avoiding cache trashing problems), then to submit strips based on the computations using pvr_prim calls (doing access to the RAM in a non sequencial way).
- Solution two is to compute strips and submit them immediately through store queues using mat_transform_sq. It looks more efficient, but requires to compute multiple times the same vertices when strips have vertices in common.
If you have any ideas, feel free to share.
Thanks in advance!!
Recently I have worked in 3D using KGLX. Matrix Transforms seem quite slow compared, as the Render Engine I am working on seems to max around 30,000 textured, rotated, blended, scaled, and transformed (QUADS or TRIS)/second.
I am not sure if it helps, but here is the function I have written to Render a Poly dependent on the Camera ViewPoint.
How things look in my code currently ( any suggestions appreciated ):
Code: Select all
inline DCE_GlRenderObj( DCE_OBJ * obj, vector4f campos, vector4f camdst )
{
/* Set up basic GL render stack */
glLoadIdentity();
gluLookAt( campos[x], campos[y], campos[z], /* View-Point Source */
camdst[x], camdst[y], camdst[z], /* View-Point Destination */
0, 1, 0 ); /* "Up Vector" */
/* Apply Screen-Space Transformations */
glTranslatef(obj->pos[x],obj->pos[y],obj->pos[z]);
/* Apply Object Rotations */
glRotatef(obj->rot[x],1.0f,0.0f,0.0f);
glRotatef(obj->rot[y],0.0f,1.0f,0.0f);
glRotatef(obj->rot[z],0.0f,0.0f,1.0f);
/* Apply Object Scaling */
glScalef(obj->scale,obj->scale,obj->scale);
/* Apply Texture Mapping */
glBindTexture(GL_TEXTURE_2D, (GLuint)obj->txaddr);
/* Throw the Object Matrix into the GL Pipeline */
if(obj->primitive==DCE_RENDER_QUAD)
{
glBegin(GL_QUADS);
glTexCoord2f(obj->uend, obj->vend); glVertex3fv( obj->mat );
glTexCoord2f(obj->ust, obj->vend); glVertex3fv( obj->mat+1 );
glTexCoord2f(obj->ust, obj->vst); glVertex3fv( obj->mat+2 );
glTexCoord2f(obj->uend, obj->vst); glVertex3fv( obj->mat+3 );
}
else
{
glBegin(GL_TRIANGLES);
//glColor3f(1.0,0.0,0.0);
glTexCoord2f(obj->ust, obj->vst); glVertex3fv( obj->mat );
//glColor3f(0,1.0,0.0);
glTexCoord2f(obj->uend/2.0f, obj->vend/2.0f); glVertex3fv( obj->mat+1 );
//glColor3f(0,0.0,1.0);
glTexCoord2f(obj->ust, obj->vend); glVertex3fv( obj->mat+2 );
}
glEnd();
}
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Hi Ph3NOM, thanks for answering.
I have also worked on a 3D engine, based on KGLX a few years ago. You can find a little demo attached (a racing game).
A good way I found to know how much time I had to draw more polygons was to change the screen background color (the bars you can see around the screen).
All my 3D objects are hand made, so strips are quite optimized, and it looks like most of the games that run on the Dreamcast display much more textured and transformed polygons than I do, so it looks like KGL is not the most optimal way to submit polygons
This is why I am trying to explore lower level ways of submitting polys.
Your code looks OK, I don't think there is a better way to submit vertices to KGL. Considering Store queues and memory cache optimization I could see in the PVR API code comments, it looks like KGL can't be as efficient as the core API, even if you submit your vertices in a very optimal way.
Even if the PVR API can be very efficient, I still need to look into a few aspects:
- The 3D modelers I have seen so far don't export optimized strips. This forces us to submit independent triangles to transform and to the PVR with many redundant vertices. I am working on an exporter for Blender to solve that problem.
- Test performances of prv_prim and mat_transform_sq to determine if it is better to transform all vertices only once, then submit them in a cache unfriendly order, or compute vertices multiple times and submit them using the store queues.
Just as a quick note, the sound in the game is produced by the AICA driver I talked about a few month (years?) ago on this forum, but that is another discussion
Thanks again!
I have also worked on a 3D engine, based on KGLX a few years ago. You can find a little demo attached (a racing game).
A good way I found to know how much time I had to draw more polygons was to change the screen background color (the bars you can see around the screen).
All my 3D objects are hand made, so strips are quite optimized, and it looks like most of the games that run on the Dreamcast display much more textured and transformed polygons than I do, so it looks like KGL is not the most optimal way to submit polygons
This is why I am trying to explore lower level ways of submitting polys.
Your code looks OK, I don't think there is a better way to submit vertices to KGL. Considering Store queues and memory cache optimization I could see in the PVR API code comments, it looks like KGL can't be as efficient as the core API, even if you submit your vertices in a very optimal way.
Even if the PVR API can be very efficient, I still need to look into a few aspects:
- The 3D modelers I have seen so far don't export optimized strips. This forces us to submit independent triangles to transform and to the PVR with many redundant vertices. I am working on an exporter for Blender to solve that problem.
- Test performances of prv_prim and mat_transform_sq to determine if it is better to transform all vertices only once, then submit them in a cache unfriendly order, or compute vertices multiple times and submit them using the store queues.
Just as a quick note, the sound in the game is produced by the AICA driver I talked about a few month (years?) ago on this forum, but that is another discussion
Thanks again!
- Attachments
-
- turbolz.elf.gz
- Demo of my KGL based racing game
- (2.44 MiB) Downloaded 167 times
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Hi again,
I had a closer look at your code, and I am not sure I understand why exactly you are doing things like this. Is your engine based on sprites? If you need to render meshes (more than one triangle / quad), you might prefer this approach:
- Set up basic GL render stack. This should be done only once or twice, as it computes the rendering matrix
- Apply Screen-Space Transformations. Same as above
- Apply Object Rotations. Do this only once per object, as, once again, it computes the rendering matrix
- Apply Object Scaling. Same comment
- Apply Texture Mapping. Only when the texture changes
- Submit any triangles / quads you can with the current render matrix and texture configuration
In KGL, there is a method that allows you to push the current matrix and pop it back. You can use that system to store the matrix after "Set up basic GL render stack" and "Apply Screen-Space Transformations" to avoid recomputing it every time you want to render a poly.
I hope this helps. Even if you only render sprites, the Push / Pop system can help a lot with performance.
I had a closer look at your code, and I am not sure I understand why exactly you are doing things like this. Is your engine based on sprites? If you need to render meshes (more than one triangle / quad), you might prefer this approach:
- Set up basic GL render stack. This should be done only once or twice, as it computes the rendering matrix
- Apply Screen-Space Transformations. Same as above
- Apply Object Rotations. Do this only once per object, as, once again, it computes the rendering matrix
- Apply Object Scaling. Same comment
- Apply Texture Mapping. Only when the texture changes
- Submit any triangles / quads you can with the current render matrix and texture configuration
In KGL, there is a method that allows you to push the current matrix and pop it back. You can use that system to store the matrix after "Set up basic GL render stack" and "Apply Screen-Space Transformations" to avoid recomputing it every time you want to render a poly.
I hope this helps. Even if you only render sprites, the Push / Pop system can help a lot with performance.
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: mat_transform / pvr_prim vs mat_transform_sq
Yeah actually that helped me realize when I need to push and pop the render matrix.
I only need to apply the camera perspective to the Render Matrix before rendering every Poly. Thanks for that.
Also, Rotations can be applied to the objects upon creation ( per-processing ).
And right that is not really optimized for meshes, simple geometry only, currently.
Attached is a demo. BGM is done using LibS3MPlay that I uploaded on the forums some time ago
I only need to apply the camera perspective to the Render Matrix before rendering every Poly. Thanks for that.
Also, Rotations can be applied to the objects upon creation ( per-processing ).
And right that is not really optimized for meshes, simple geometry only, currently.
Code: Select all
inline DCE_GlRenderObj( DCE_OBJ * obj )
{
/* Push the Render Matrix onto the Stack */
glPushMatrix();
/* Apply Screen-Space Transformations */
glTranslatef(obj->pos[x],obj->pos[y],obj->pos[z]);
/* Apply Object Scaling */
glScalef(obj->scale,obj->scale,obj->scale);
/* Apply Texture Mapping */
glBindTexture(GL_TEXTURE_2D, (GLuint)obj->txaddr);
/* Throw the Object Matrix into the GL Pipeline */
if(obj->primitive==DCE_RENDER_QUAD)
{
glBegin(GL_QUADS);
glTexCoord2f(obj->uend, obj->vend); glVertex3fv( obj->mat );
glTexCoord2f(obj->ust, obj->vend); glVertex3fv( obj->mat+1 );
glTexCoord2f(obj->ust, obj->vst); glVertex3fv( obj->mat+2 );
glTexCoord2f(obj->uend, obj->vst); glVertex3fv( obj->mat+3 );
}
else
{
glBegin(GL_TRIANGLES);
glTexCoord2f(obj->ust, obj->vst); glVertex3fv( obj->mat );
glTexCoord2f(obj->uend/2.0f, obj->vend/2.0f); glVertex3fv( obj->mat+1 );
glTexCoord2f(obj->ust, obj->vend); glVertex3fv( obj->mat+2 );
}
glEnd();
/* Pop the Render Matrix off the Stack */
glPopMatrix();
}
Attached is a demo. BGM is done using LibS3MPlay that I uploaded on the forums some time ago
- Attachments
-
- dc-engine-3d.rar
- (1.68 MiB) Downloaded 157 times
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: mat_transform / pvr_prim vs mat_transform_sq
I am curious if you have had any success?Bouz wrote:Oh no, the exporter should do all the work and produce structures ready to process by the PVR. Fast to load, fast to display (I hope so).
Since this post, I have written an exporter to convert 3ds objects into a mesh format optimized for use with glDrawArrays().
Here is the structure I have created for manipulating meshes:
Code: Select all
typedef struct
{
FOURCC sig; // Signature = "DCE "
FOURCC type; // File Type = "MESH"
DWORD verts; // Number of Vertices
DWORD primitive; // Primitive type: 3=Triangles 4=Quads
FLOAT scale; // Mesh Scale
vector4f pos; // Mesh Position ( x, y, z, null )
vector4f rot; // Mesh Rotation ( x, y, z, null )
CHAR texname[116]; // Mesh Texture Name
DWORD * texaddr; // Pointer to Texture Address in VRAM (Set by Engine)
FLOAT * texcoord; // Array of Texture u/v coordinates
FLOAT * vert; // Array of Vertex Data
} DCE_MESH;
Code: Select all
void make_mesh(3DS_OBJ * object, DCE_MESH * mesh)
{
DWORD i;
FLOAT *ptr;
ptr = mesh->texcoord;
for (i=0;i<object->polygons_qty;i++)
{
*ptr++ = object->mapcoord[ object->polygon[i].a ].u;
*ptr++ = object->mapcoord[ object->polygon[i].a ].v;
*ptr++ = object->mapcoord[ object->polygon[i].b ].u;
*ptr++ = object->mapcoord[ object->polygon[i].b ].v;
*ptr++ = object->mapcoord[ object->polygon[i].c ].u;
*ptr++ = object->mapcoord[ object->polygon[i].c ].v;
}
ptr = mesh->vert;
for (i=0;i<object->polygons_qty;i++)
{
*ptr++ = object->vertex[ object->polygon[i].a ].x;
*ptr++ = object->vertex[ object->polygon[i].a ].y;
*ptr++ = object->vertex[ object->polygon[i].a ].z;
*ptr++ = object->vertex[ object->polygon[i].b ].x;
*ptr++ = object->vertex[ object->polygon[i].b ].y;
*ptr++ = object->vertex[ object->polygon[i].b ].z;
*ptr++ = object->vertex[ object->polygon[i].c ].x;
*ptr++ = object->vertex[ object->polygon[i].c ].y;
*ptr++ = object->vertex[ object->polygon[i].c ].z;
}
}
Once the object is loaded into ram, it can be rendered with this function call:
Code: Select all
void DCE_GlRenderArray( DCE_MESH * mesh )
{
/* Push the Render Matrix onto the Stack */
glPushMatrix();
glTranslatef(mesh->pos[x],mesh->pos[y],mesh->pos[z]);
glRotatef(mesh->rot[x],1.0f,0.0f,0.0f);
glRotatef(mesh->rot[y],0.0f,1.0f,0.0f);
glRotatef(mesh->rot[z],0.0f,0.0f,1.0f);
glScalef(mesh->scale,mesh->scale,mesh->scale);
glBindTexture(GL_TEXTURE_2D, mesh->texaddr);
glVertexPointer(3, GL_FLOAT, 0, mesh->vert);
glTexCoordPointer(2, GL_FLOAT, 0, mesh->texcoord);
if(mesh->primitive==DCE_RENDER_QUAD)
glDrawArrays(GL_QUADS, 0, mesh->verts*mesh->primitive );
else
glDrawArrays(GL_TRIANGLES, 0, mesh->verts*mesh->primitive );
/* Pop the Render Matrix off the Stack */
glPopMatrix();
}
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Hi Ph3nom,
Thanks for the code! No big progress on my side. The last days, I have learnt Python and the Blender API.
I still have to:
- write the stripping algorithm (Blender / Python)
- write the binary exporter (Blender / Python)
- write the Dreamcast code (seemed to be the hardest part a few month ago, this is the easiest today!)
My target is still performance, so the strips should be part of the final data structure (still this strip obsession). From what I have learnt, it should be possible to have multiple textures per mesh and handle bones (but this is not in my list for today!).
I would be really happy to find the code of this glDrawArrays() function!
Thanks for the code! No big progress on my side. The last days, I have learnt Python and the Blender API.
I still have to:
- write the stripping algorithm (Blender / Python)
- write the binary exporter (Blender / Python)
- write the Dreamcast code (seemed to be the hardest part a few month ago, this is the easiest today!)
My target is still performance, so the strips should be part of the final data structure (still this strip obsession). From what I have learnt, it should be possible to have multiple textures per mesh and handle bones (but this is not in my list for today!).
I would be really happy to find the code of this glDrawArrays() function!
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
It might take longer than expected, as I have just killed my Archlinux install while trying to update Blender. Linux rulezzz
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: mat_transform / pvr_prim vs mat_transform_sq
Sounds like fun, good luck with your mesh->strip algorithm! If done right that could be very valuable.Bouz wrote:Hi Ph3nom,
Thanks for the code! No big progress on my side. The last days, I have learnt Python and the Blender API.
I still have to:
- write the stripping algorithm (Blender / Python)
- write the binary exporter (Blender / Python)
- write the Dreamcast code (seemed to be the hardest part a few month ago, this is the easiest today!)
As a starting point, maybe you can start by exporting as vertex arrays first, as I have. After you get the basic implementation working, then advance to the strip optimization. Divide and conquer!
Yes to my mind the fastest possible render method ( with GL ) should be to use glDrawArrays(GL_TRIANGLE_STRIP, 0, numOfVerts );Bouz wrote:My target is still performance, so the strips should be part of the final data structure (still this strip obsession).
I would be really happy to find the code of this glDrawArrays() function!
http://www.opengl.org/sdk/docs/man/xhtm ... Arrays.xml
About textures, do you mean multiple textures per face(single poly), or multiple textures per mesh(group of polys)?Bouz wrote:From what I have learnt, it should be possible to have multiple textures per mesh and handle bones (but this is not in my list for today!).
I can think of more than one way to handle multiple textures per mesh on DC, using glDrawArrays().
Primarily, this is achieved by arranging the polygons to be rendered by texture id, or face.
Once arranged, a mesh can be rendered with multiple textures by making a separate render call for each group of faces that share the same texture.
What I can not realize is how to handle multiple textures per face on DC, using glDrawArrays().
Typically, multi-texturing is accomplished with glAcitveTexture()
http://www.java-gaming.org/topics/multi ... #msg189204
Problem is that glActiveTexture() is not supported in the DC's build of GL
-
- DC Developer
- Posts: 105
- Joined: Sun Oct 04, 2009 11:13 am
- Has thanked: 2 times
- Been thanked: 90 times
Re: mat_transform / pvr_prim vs mat_transform_sq
You get pretty easily get a decent strip generator running in Blender with this:
http://www.executionunit.com/en/blog/20 ... phoneipad/
The download has a sample Blender script for sending data from Blender, to the stripifier, and receiving the results. It only took minor changes to get it working on my system.
The strip generator was made by nVidia, and is written in C++, which its source available. It's designed to generate strips for systems with built-in T&L caches that can recognize when the same vertex reused to avoid recalculating the vertex. The Dreamcast doesn't have any dedicated T&L processors that do this, so the strips generated aren't the best they could be for the DC, but what it generates is far better than nothing.
One thing of note is that the strip generator like to output degenerate polygons to change winding order. The PVR does not like getting degenerate polygons when culling is turned off, and will spit out horizontal lines across the tile the degenerate is on. Turning on any culling (CW, CCW, small) will fix it. The images attached show what it looks like.
Also, be sure you don't accidentally pass invalid parameters from Blender to the strip generator program, otherwise it hangs. Also, set the -cs parameter to something like 100 to get the program to output longer strips.
http://www.executionunit.com/en/blog/20 ... phoneipad/
The download has a sample Blender script for sending data from Blender, to the stripifier, and receiving the results. It only took minor changes to get it working on my system.
The strip generator was made by nVidia, and is written in C++, which its source available. It's designed to generate strips for systems with built-in T&L caches that can recognize when the same vertex reused to avoid recalculating the vertex. The Dreamcast doesn't have any dedicated T&L processors that do this, so the strips generated aren't the best they could be for the DC, but what it generates is far better than nothing.
One thing of note is that the strip generator like to output degenerate polygons to change winding order. The PVR does not like getting degenerate polygons when culling is turned off, and will spit out horizontal lines across the tile the degenerate is on. Turning on any culling (CW, CCW, small) will fix it. The images attached show what it looks like.
Also, be sure you don't accidentally pass invalid parameters from Blender to the strip generator program, otherwise it hangs. Also, set the -cs parameter to something like 100 to get the program to output longer strips.
- Attachments
-
- yes cull.png (47.63 KiB) Viewed 4516 times
-
- no cull.png (66.24 KiB) Viewed 4516 times
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: mat_transform / pvr_prim vs mat_transform_sq
TapamN - Thanks for the info, it seems Bouz has the hard part already done for him!
Are the screens you posted running on DC? If so, would you mind uploading the binary to have a look?
Also, what license is that source code released under?
Are the screens you posted running on DC? If so, would you mind uploading the binary to have a look?
Also, what license is that source code released under?
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Well, in fact I will start with a basic strip algorithm, and make the full pipe work. Then I will work on the algo again to make it better.PH3NOM wrote:Sounds like fun, good luck with your mesh->strip algorithm! If done right that could be very valuable.
As a starting point, maybe you can start by exporting as vertex arrays first, as I have. After you get the basic implementation working, then advance to the strip optimization. Divide and conquer!
This might be true for recent 3D cards, that handle arrays of vertices and indexes, but this is probably not the case for the PVR, that only handles triangle strips. This is why I could like to have a look at the source code of this glDrawArrays function!PH3NOM wrote:Yes to my mind the fastest possible render method ( with GL ) should be to use glDrawArrays(GL_TRIANGLE_STRIP, 0, numOfVerts );
http://www.opengl.org/sdk/docs/man/xhtm ... Arrays.xml
Well, I meant multiple textures per mesh, but it is really not a priority. I don't think is is possible to have multiple texture for one triangle on the PVR (this is probably why glActiveTexture is not part of the DC's build of GL).PH3NOM wrote:About textures, do you mean multiple textures per face(single poly), or multiple textures per mesh(group of polys)?
I can think of more than one way to handle multiple textures per mesh on DC, using glDrawArrays().
Primarily, this is achieved by arranging the polygons to be rendered by texture id, or face.
Once arranged, a mesh can be rendered with multiple textures by making a separate render call for each group of faces that share the same texture.
What I can not realize is how to handle multiple textures per face on DC, using glDrawArrays().
Typically, multi-texturing is accomplished with glAcitveTexture()
http://www.java-gaming.org/topics/multi ... #msg189204
Problem is that glActiveTexture() is not supported in the DC's build of GL
My Archlinux machine is back, yeeehaaa..
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Hi TapamN, thanks for the info! I found references to this Nvidia tool, but never found the tool itself. Apparently, the link to the Nvidia dev site is not valid.TapamN wrote:You get pretty easily get a decent strip generator running in Blender with this:
http://www.executionunit.com/en/blog/20 ... phoneipad/
The download has a sample Blender script for sending data from Blender, to the stripifier, and receiving the results. It only took minor changes to get it working on my system.
The strip generator was made by nVidia, and is written in C++, which its source available. It's designed to generate strips for systems with built-in T&L caches that can recognize when the same vertex reused to avoid recalculating the vertex. The Dreamcast doesn't have any dedicated T&L processors that do this, so the strips generated aren't the best they could be for the DC, but what it generates is far better than nothing.
One thing of note is that the strip generator like to output degenerate polygons to change winding order. The PVR does not like getting degenerate polygons when culling is turned off, and will spit out horizontal lines across the tile the degenerate is on. Turning on any culling (CW, CCW, small) will fix it. The images attached show what it looks like.
Also, be sure you don't accidentally pass invalid parameters from Blender to the strip generator program, otherwise it hangs. Also, set the -cs parameter to something like 100 to get the program to output longer strips.
Anyway, this is not a big problem as I find really interesting to produce a stripifier!
- T_chan
- DC Developer
- Posts: 32
- Joined: Mon Aug 22, 2011 12:45 pm
- Has thanked: 12 times
- Been thanked: 22 times
Re: mat_transform / pvr_prim vs mat_transform_sq
You might want to have a look at this one: http://users.telenet.be/tfautre/softdev/tristripper/
- PH3NOM
- DC Developer
- Posts: 576
- Joined: Fri Jun 18, 2010 9:29 pm
- Has thanked: 0
- Been thanked: 5 times
Re: mat_transform / pvr_prim vs mat_transform_sq
Again, good luck!Bouz wrote:Well, in fact I will start with a basic strip algorithm, and make the full pipe work. Then I will work on the algo again to make it better.
The motivation behind glDrawArrays() is simple; eliminate overhead from function calls in the rendering routine. Use less CPU time for render.Bouz wrote:This might be true for recent 3D cards, that handle arrays of vertices and indexes, but this is probably not the case for the PVR, that only handles triangle strips. This is why I could like to have a look at the source code of this glDrawArrays function!PH3NOM wrote:Yes to my mind the fastest possible render method ( with GL ) should be to use glDrawArrays(GL_TRIANGLE_STRIP, 0, numOfVerts );
http://www.opengl.org/sdk/docs/man/xhtm ... Arrays.xml
Every time a function is called, the machine must push all of the needed parameters onto the system stack, as well as the address of where to 'jump' to execute the function, and then where to return after the function is done. If the parameters are call-by-refernce, we only need to push the address of the variable. Even worse, if the variables are call-by-value, we have to make a copy of the values.
So, lets say we decide not to use glDrawArrays(), instead glVertex3fv().
Every triangle ( or quad ) makes a call to glVertex3fv()
( KGLX implementation )
Code: Select all
void glVertex3fv(GLfloat *v) {
glVertex4f(v[0], v[1], v[2], 1.0f);
}
void glVertex4f(GLfloat x, GLfloat y, GLfloat z,GLfloat w) {
GLParam p[5];
p[0].op=OP_Vertex;
p[1].f=x;
p[2].f=y;
p[3].f=z;
p[4].f=w;
gl_add_op(p);
}
Next, we make a call to glVertex4f, pushing the address of the function, and the address to return to, and since glVertex4f() uses call-by-value, we also must create a copy of all 4 parameters to the function call ( 7 pushes, 4 copies ).
When this is being done thousands of times per frame, every little bit makes a difference.
Here is glDrawArrays() (glapi.c) that I have optimized from the version in KGLX
Code: Select all
void glDrawArrays( GLenum mode, GLint first, GLsizei count )
{
unsigned int n=first+count;
GLParam p[2];
p[0].op=OP_Begin; /* GlBegin() */
p[1].i=mode;
gl_add_op(p);
--first; /* SH4 pre-increment */
p[0].op = OP_ArrayElement; /* GlDrawArrayElement() */
while(first<n)
{
p[1].i = ++first;
gl_add_op(p);
}
p[0].op=OP_End; /* GlEnd() */
gl_add_op(p);
}
- Bouz
- DCEmu Junior
- Posts: 46
- Joined: Mon May 10, 2010 3:42 pm
- Location: St. Bauzille de Putois (France)
- Has thanked: 0
- Been thanked: 0
Re: mat_transform / pvr_prim vs mat_transform_sq
Ph3nom: thanks for the info, now I know where the this function comes from (KGLX). I have downloaded the 0.2 version. I have not gone deep into the code to know how the array is processed to produce triangles for the PVR API, bu tI can already saw that the function gl_add_op itself is calling lots of functions.
When I speak of triangle strips, I don't want to use KGL, but directly the PVR and the function mat_transform_sq(). The mesh is loaded in memory, and mat_transform_sq() computes vertex transform and directly transfers to the PVRthrough the store queue, so I think it should be much faster than anything running under KGLX.
But once again, we will be sure once my system is complete
T_chan: thanks a lof for this URL, the site is full of interesting info! It will help a lot. Of course, the PVR does not have any vertex cache, so not everything applies, but it is a really good page!
Thanks again!
When I speak of triangle strips, I don't want to use KGL, but directly the PVR and the function mat_transform_sq(). The mesh is loaded in memory, and mat_transform_sq() computes vertex transform and directly transfers to the PVRthrough the store queue, so I think it should be much faster than anything running under KGLX.
But once again, we will be sure once my system is complete
T_chan: thanks a lof for this URL, the site is full of interesting info! It will help a lot. Of course, the PVR does not have any vertex cache, so not everything applies, but it is a really good page!
Thanks again!