OpenGL - New Build in the works

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

OpenGL - New Build in the works

Post by PH3NOM » Fri Feb 01, 2013 9:41 pm

So, I have started a new thread instead of hijacking this one
viewtopic.php?f=29&t=102181&start=60

As I mentioned, I have been working on a new build of OpenGL for the DC.
Still an early work in progress, it is faster than KGL, and I am releasing source in hopes for advise on improvement.

I am posting a small demo, with full source included along with a .cdi disc image to burn and test.
gl-particle-test.rar
(744.99 KiB) Downloaded 107 times
** This is a simple "Random Particle Generator" (C) Josh PH3NOM Pearson 2013.
** Written to test my GL API, this example demonstrates several things:
** -GL Pipeline Vertex Throughput ( Also PVR TR Poly Throughput )
** -GL Pipeline Mixed Submission of Opaque and Transparent Polys.
** -KOS C++ Functionality/Speed
** -KOS C++ Dynamic Memory Usage
Use d-pad to move cursor, and press 'start' to begin particle generation.

Image
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by RyoDC » Sat Feb 02, 2013 7:46 am

Cool stuff!
Phenom, so OpenGL for DC works as fast as pvr api do?
How do I try to build a Dreamcast toolchain:
Image
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by PH3NOM » Sun Feb 03, 2013 9:24 pm

RyoDC wrote:Cool stuff!
Phenom, so OpenGL for DC works as fast as pvr api do?
Well, it sends Vertex Data to the PVR directly ( via DMA or SQ, depending on how you decide to compile it :-)), instead of using the KOS PVR functions.
So, in that regard, yes. But, we also have to perform Matrix Translations using the SH4's matrix routines for each vertex.
That said, GL must also perform certian things behind the scenes to make things work easy for the user.
In the end, things are as fast as they can possibly be when conforming to the GL API standards. ( Unless Tapamn can step in and help :-) )

To show how Texture Binding works in my current Gl API, I have created a new demo, "Kamikaze v.0.1"

Image

I have included the full source code. Please note, my GL Library is a work in progress and not meant for outside use.
gltest02-kamikaze01.rar
(834.02 KiB) Downloaded 101 times
User avatar
Bouz
DCEmu Junior
DCEmu Junior
Posts: 46
Joined: Mon May 10, 2010 3:42 pm
Location: St. Bauzille de Putois (France)
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by Bouz » Mon Feb 04, 2013 1:49 pm

Nice work! What solution did you finally choose to handle the three different poly list types?
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1808
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has liked: 0
Been liked: 0
Contact:

Re: OpenGL - New Build in the works

Post by GyroVorbis » Mon Feb 04, 2013 4:26 pm

PH3NOM wrote:Well, it sends Vertex Data to the PVR directly ( via DMA or SQ, depending on how you decide to compile it :-)), instead of using the KOS PVR functions.
Does that mean this flag controls whether you use the intermediate RAM buffer + DMA approach (for being able to switch between PVR list types) or the direct rendering with the SQs (and having to submit one list at a time) approach?
Elysian Shadows - "Next-Gen" 2D/3D RPG coming to Sega Dreamcast, Steam, OUYA, and Smartphones
Image
http://www.elysianshadows.com
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by PH3NOM » Tue Feb 05, 2013 6:51 pm

Kamikaze v.0.2 is Now Available, with source included.
Press Y to shoot, Use D-Pad to move ship.
When you shoot, you use energy, so keep your eye on the meter!
gltest-kamikaze-02.rar
(1.3 MiB) Downloaded 108 times
Image

Image

Image
Bouz wrote:Nice work! What solution did you finally choose to handle the three different poly list types?
Well, I used the method I described in the other thread
viewtopic.php?f=29&t=102181&start=60
But I only handle OP and TR polys, not PT. What are they used for?
GyroVorbis wrote:
PH3NOM wrote:Well, it sends Vertex Data to the PVR directly ( via DMA or SQ, depending on how you decide to compile it :-)), instead of using the KOS PVR functions.
Does that mean this flag controls whether you use the intermediate RAM buffer + DMA approach (for being able to switch between PVR list types) or the direct rendering with the SQs (and having to submit one list at a time) approach?
No, I am not using the KOS DMA Buffer functions, I manage all of that myself, have a look at GL/gl-render.c in the code I uploaded above. If you look a the last function in that file "RenderCallback()", you can see that if DMA is enabled, Veretex Data is sent directly to the TA with the function "pvr_dma_load_ta()". If DMA is not enabled, SQ's are used instead:

Code: Select all

sq_cpy((pvr_vertex_t*)  0x10000000, (pvr_vertex_t*)VERT_LIST[OP][i].vertex, 0x20*VERT_LIST[OP][i].vertices );
If you look at pvr.h you will see

Code: Select all

#define PVR_TA_INPUT		0x10000000	/* TA command input */
I am curious if you have made a benchmark of Vertex throughput using the KOS DMA set_vertbuf() stuff?
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5387
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has liked: 1 time
Been liked: 12 times
Contact:

Re: OpenGL - New Build in the works

Post by BlueCrab » Tue Feb 05, 2013 8:50 pm

PH3NOM wrote:But I only handle OP and TR polys, not PT. What are they used for?
Punchthrus are polygons that have essentially one bit of alpha. Either a color is totally visible, or it is totally invisible (like, for instance, ARGB1555 textures). Punchthrus are much faster to render than translucent polygons. The fill-rate for punchthrus is essentially equivalent to opaque polygons.
I am curious if you have made a benchmark of Vertex throughput using the KOS DMA set_vertbuf() stuff?
I think there is a benchmark of vertex throughput with the dma in the KOS examples, if I'm not mistaken. I think it was called serpent_dma or something like that. I also remember there being a version of pvrmark with dma, although it may well not be in the examples.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by PH3NOM » Fri Feb 08, 2013 12:20 pm

BlueCrab wrote:
PH3NOM wrote:But I only handle OP and TR polys, not PT. What are they used for?
Punchthrus are polygons that have essentially one bit of alpha. Either a color is totally visible, or it is totally invisible (like, for instance, ARGB1555 textures). Punchthrus are much faster to render than translucent polygons. The fill-rate for punchthrus is essentially equivalent to opaque polygons.
I am curious if you have made a benchmark of Vertex throughput using the KOS DMA set_vertbuf() stuff?
I think there is a benchmark of vertex throughput with the dma in the KOS examples, if I'm not mistaken. I think it was called serpent_dma or something like that. I also remember there being a version of pvrmark with dma, although it may well not be in the examples.
Oh ok thanks for the info.

But, looking at the "Serpent DMA" in KOS, it is using mat_transform_sq
/* Transform and write vertices to the TA via the store queues */
But that seems to contradict what you have said about mixing vertex submission modes while Vertex DMA is enabled.
So, I still dont understand....
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5387
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has liked: 1 time
Been liked: 12 times
Contact:

Re: OpenGL - New Build in the works

Post by BlueCrab » Fri Feb 08, 2013 9:22 pm

The store queue stuff that is going on in the serpent_dma example is actually using the store queue to write to the DMA buffer in main ram. I'm guessing the comment was a remnant of some earlier iteration that did use the store queues to write directly to the TA.

pvr_vertbuf_tail() returns a pointer to the vertex buffer for the specified list in main RAM, and since that is being set as the SQ destination, it is definitely writing to main RAM and not to the TA directly.

Basically, the store queues are used there so that the mat_transform_sq() function can still be used to do the transformation, at least that is my guess. I didn't write the example, so I can't say for sure, but that's the only logical reason for it.
Jae686
Insane DCEmu
Insane DCEmu
Posts: 112
Joined: Sat Sep 22, 2007 9:43 pm
Location: Braga - Portugal
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by Jae686 » Sun Feb 10, 2013 4:36 pm

:) looking forward to test it (as soon as I have my DC fixed)
User avatar
Anthony817
Insane DCEmu
Insane DCEmu
Posts: 127
Joined: Wed Mar 10, 2010 1:29 am
Location: Fort Worth, Texas
Has liked: 9 times
Been liked: 3 times

Re: OpenGL - New Build in the works

Post by Anthony817 » Mon Feb 11, 2013 7:58 pm

Wow! Keep up the great work phenom, always nice to see this site keeping the Dreamcast alive with new homebrew stuff! :mrgreen:
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

PVR - Clipping Polygons to the View Frustum

Post by PH3NOM » Wed Feb 20, 2013 8:42 pm

So, I have worked out a basic transform stack with my build of GL, and began some tests with rendering different scenes.

Immediately, I observed some strange behaviour of polygons behind other polygons showing through each other.

This was fixed by setting the poly context depth flag to PVR_DEPTHWRITE_ENABLE for opaque polygons.
This is done in my build by the call glEnable(GL_DEPTH_TEST);

Next, I noticed the PVR does not like receiving vertices that are outside of the view frustum; some very natsy results can occur, up to and including a drop in framerate down to 30fps with graphical glitches on screen.

So that brings me to this topic, Clipping polygons to the view frustum. KGL obviously had some problems with clipping, as seen in several demos ( anyone try that "open dynamics" buggy demo? http://www.boob.co.uk/devtools.html ) and some testing I did a while back viewtopic.php?f=29&t=102059

So, I really want to implement a nice but fast clipping algorithm.
I have read a basic outline of the sutherland-hodgeman algorithm, and I have begun my own implementation loosely based on that outline.
My algorithm basically walks the vertices, determining if it is inside the view frustum, and then transforms the vertex if needed.
At first it seems simple, but then you realize vertices may also be added to the new clipped polygon...
For example, I have tested using a bounding box within the view frustum for visual confirmation.

The initial scene, we see a triangle centered within a box
Image

Moving the triangle to the right, without clipping the triange, the geometry continues beyond the wall of the box
Image

Now, with my clip algorithm enabled, the triangle becomes a quadrilateral, and remains inside of the box
Image

I am posting, mainly because I am curious what solutions other devs here have used to manage 3D clipping for the PVR, any thoughts welcome.

Oh yeah, here is the function I wrote to clip a vertex on the x axis, say to a view frustum x min or max
It calculates the point of intersection between the vertices to the frustum, and transforms the vertex accordingly

Code: Select all

static matrix_t TM __attribute__((aligned(32))) =
{
      { 1.0f, 0.0f, 0.0f, 0.0f },
      { 0.0f, 1.0f, 0.0f, 0.0f },
      { 0.0f, 0.0f, 1.0f, 0.0f },
      { 0.0f, 0.0f, 0.0f, 1.0f }
};

static vector4f dvt; // Displacement Transform Vector

void LineClipFrustumX3fv( vector3f v1, vector3f v2, float fx )
{
     /* Calculate Displacement Vector ( |Dv| ) As a Matrix */
     TM[0][0] = v2[0]-v1[0]; 
     TM[1][1] = v2[1]-v1[1];
     TM[2][2] = v2[2]-v1[2];
     TM[3][3] = 1.0f;
     mat_load(&TM);

     /* Transform Clip Point ( |Dv|*mag ) */
     dvt[0] = dvt[1] = dvt[2] = (fx - v1[0])/TM[0][0]; /* Magnitude */
     dvt[3] = 1.0f;
     mat_trans_nodiv( dvt[0], dvt[1], dvt[2], dvt[3] );
     
     v1[0] += dvt[0];      /* Update the Vertices to Transformed Clip Point */
     v1[1] += dvt[1];
     v1[2] += dvt[2];
}
TapamN
DCEmu Junior
DCEmu Junior
Posts: 42
Joined: Sun Oct 04, 2009 11:13 am
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by TapamN » Fri Mar 01, 2013 1:56 pm

PH3NOM wrote:Next, I noticed the PVR does not like receiving vertices that are outside of the view frustum; some very natsy results can occur, up to and including a drop in framerate down to 30fps with graphical glitches on screen.
It's just polygons that cross the near plane that are the problem. (i.e. polygons that are both part in-front of the camera, and part behind it.) You don't need to bother clipping anything else, the PVR handles XY clipping and far clipping for you.

The depth write enable bit is actually equivalent to the glDepthMask call. An enabled GL_DEPTH_TEST is how things normally work, but a disabled depth test is equivalent to setting the depth compare to gl_always and disabling depth writes (but doesn't actually change the glDepthFunc or glDepthMask values).

I use the clipping algorithm described in this paper, which is designed to clip triangle strips efficiently.

Also, generating a matrix for each edge you clip seems... uh... extremely inefficient. Normally, you just calculate where the plane intersects the edge (generating a value from 0.0 to 1.0), and then generate vertex data for where the intersection point by linear interpolating the vertex data between the two vertices. No need to mess with matrices.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by PH3NOM » Fri Mar 01, 2013 9:22 pm

As always, thank you for your input TapamN.

The test I ran indicated X and Y Clipping was also needed, but I will look at it again to see if the said polygons were also actually crossing the Z Near Plane at some point off screen.

For now, calling glDepthMask has the same effect, but thank you for the clarification, as I have focused on sorting more immediate problems first.

About the clipping math, first I worked things out on paper. I think at its core, it is the same as you speak of.
I came up with a formula to determine where 2 vertices intersect a plane, when at least one component of that plane is known.
That known component represents the plane ( i.e. view frustum) that the vertices are known to cross.

Let vertex 1 be |v1|, vertex 2 be |v2|, and the componet c ( 0=x, 1=y, 2=z ), and that component is a know value, val.

First, I determine the Displacement of the Vertices as ( vector - vector ) |D| = |v2| - |v1|

Next, I determine the Magnitude of Displacement (0.0->1.0) of the known component M = (val-|v1|[c]) / |D|[c]

From there, I multiply the Displacement Vector by the Magnitude of Displacement. (vector*scalar ) |D| *= M

The Clipping point is found by adding the Displacement Vector to the vertex that is outside of the clip region ( vector + vector ) |v1|+=|D|

Thats how I figured things out, it can be implemented easily without using Matrix Math. Please let me know if I am doing more work than needed here.

Regarding speed, I considered that mat_transform claims 15mil vert/sec throughput, and here its being done without perspective division. Also, I only posted the version that does not apply U/V transformation. I was able to use the loaded matrix to assist in caclulating the transformed U/V coordinates... :o I had imagined applying the clip transform while applying the screen-space transform in the pipeline, but it is still too early yet
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by PH3NOM » Wed Mar 13, 2013 10:24 pm

TapamN wrote:Also, generating a matrix for each edge you clip seems... uh... extremely inefficient. Normally, you just calculate where the plane intersects the edge (generating a value from 0.0 to 1.0), and then generate vertex data for where the intersection point by linear interpolating the vertex data between the two vertices. No need to mess with matrices.
Thanks again, that motivated me to try out another method for calculation.
I decided to use the sh4's fmac thanks to info provied by this guy http://yam.20to4.net/dreamcast/hints/index.html
It wasnt untill I worked out my function that I noticed you actually do use it in your code :evil:

FMAC is a SH4 math function that is not provided by KOS, so I thought I would share my inlined implementation.

Code: Select all

/* SH4 fmac - floating-point multiply/accumulate */
/* Returns a*b+c at the cost of a single floating-point operation */
inline float FMAC( float a, float b, float c )
{
     register float __FR0 __asm__("fr0") = a; 
     register float __FR1 __asm__("fr1") = b; 
     register float __FR2 __asm__("fr2") = c;      
     
     __asm__ __volatile__( 
        "fmac   fr0, fr1, fr2\n"
        : "=f" (__FR0), "=f" (__FR1), "=f" (__FR2)
        : "0" (__FR0), "1" (__FR1), "2" (__FR2)
        );
        
     return __FR2;       
}
For versatility, its not too hard to use that for a multiply/decrement:

Code: Select all

/* SH4 fmac - floating-point multiply/decrement */
/* Returns a*b-c at the cost of a single floating-point operation */
inline float FMDC( float a, float b, float c )
{
     register float __FR0 __asm__("fr0") = a; 
     register float __FR1 __asm__("fr1") = b; 
     register float __FR2 __asm__("fr2") = -c;      
     
     __asm__ __volatile__( 
        "fmac   fr0, fr1, fr2\n"
        : "=f" (__FR0), "=f" (__FR1), "=f" (__FR2)
        : "0" (__FR0), "1" (__FR1), "2" (__FR2)
        );
        
     return __FR2;       
}
Now the code to clip a veretex to another looks like this, including u/v correction, no more wasteful matrix math :lol:

Code: Select all

void LineClipFrustum3fvT( vector3f v1, vector3f v2, float v, BYTE c, float *uva, float *uvb )
{  
     float MAG = (v - v1[c])/(v2[c]-v1[c]); /* Magnitude */
     
     /* Use the SH4's FMAC operation to linear interpolate the U/V data */
     uva[0] = FMAC( ((v2[0]-v1[0])*MAG)/v2[0], uvb[0], uva[0] );    
     uva[1] = FMAC( ((v2[1]-v1[1])*MAG)/v2[1], uvb[1], uva[1] );
     
     /* Use the SH4's FMAC operation to linear interpolate the Vertex data */
     v1[0] = FMAC( v2[0]-v1[0], MAG, v1[0] );
     v1[1] = FMAC( v2[1]-v1[1], MAG, v1[1] );
     v1[2] = FMAC( v2[2]-v1[2], MAG, v1[2] );
}
User avatar
SiZiOUS
DC Developer
DC Developer
Posts: 386
Joined: Fri Mar 05, 2004 2:22 pm
Location: France
Has liked: 13 times
Been liked: 10 times
Contact:

Re: OpenGL - New Build in the works

Post by SiZiOUS » Thu Mar 14, 2013 4:37 am

Woah PH3NOM, your work is just very impressive! :o Keep up the great work! :grin:

I'll try your 'Kamikaze' demo very soon :D
TapamN
DCEmu Junior
DCEmu Junior
Posts: 42
Joined: Sun Oct 04, 2009 11:13 am
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by TapamN » Thu Mar 14, 2013 8:19 am

Uh, you don't really need any assembly to use FMAC instructions. You can have GCC automatically generate them for you for normal C math.

I think on older versions (GCC 3.4) all you had to do was specify -ffast-math. On more recent versions (GCC 4.7), you (also?) have to specify -mfused-madd.

Letting GCC use FMAC itself is more efficient than using inline assembly. With your assembly, it will always have to move things in and out of FR0-FR2, while when GCC can make its own FMACs it can use any registers. GCC also knows how long FMAC instructions take, and can reorder things to run faster, but it doesn't know how long asm blocks take, so it can't optimize the program as well.

But that seems like much better clipping implementation overall. If you're using gouraud shading, you also have to calculate lighting like you do with UV and position.
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by RyoDC » Fri Mar 15, 2013 12:13 pm

Reminds me the cost that you pay when switching from managed to unmanaged code and vice-versa.
How do I try to build a Dreamcast toolchain:
Image
User avatar
Neoblast
DC Developer
DC Developer
Posts: 312
Joined: Sat Dec 01, 2007 8:51 am
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by Neoblast » Mon Mar 18, 2013 8:34 pm

It is indeed impressive, I guess your GL lib is being put to good use right now in some games isn't it :)

Actually I'd like to know the difference of perfmrance you could get with benchmarks on KGL and GL.

I guess your GL is faster, but how much?


Also Shenmue 2 was said to have near 6 million poly count, is that true?
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: OpenGL - New Build in the works

Post by PH3NOM » Fri Apr 05, 2013 8:57 pm

TapamN wrote:Uh, you don't really need any assembly to use FMAC instructions. You can have GCC automatically generate them for you for normal C math.

I think on older versions (GCC 3.4) all you had to do was specify -ffast-math. On more recent versions (GCC 4.7), you (also?) have to specify -mfused-madd.

Letting GCC use FMAC itself is more efficient than using inline assembly. With your assembly, it will always have to move things in and out of FR0-FR2, while when GCC can make its own FMACs it can use any registers. GCC also knows how long FMAC instructions take, and can reorder things to run faster, but it doesn't know how long asm blocks take, so it can't optimize the program as well.

But that seems like much better clipping implementation overall. If you're using gouraud shading, you also have to calculate lighting like you do with UV and position.
Thank you again, TapamN, for your input.
I will test using standard C math and enable the -ffast-math flag in GCC. This is a good lesson for me on how to use the limited amount of registers efficiently.

Still using my inlined FMAC, it was easy to interpolate the vertex color ( or lighting ), the hard part was deciding to use packed 32bit color format for implementation with the PVR. And now it seems there will have to be multiple versions of this function to accommodate all of the possible GL texture/color enabled/disabled configurations.
Also, fixed a bug in the U/V interpolation in the last code I posted.

Code: Select all

#define ALPHA 0xFF000000 /* Color Components using PVR's Pack 32bit int */
#define RED   0x00FF0000
#define GREEN 0x0000FF00
#define BLUE  0x000000FF

inline void LineClipFrustum3fvTC1ui( vector3f v1, vector3f v2,
                                     float v, BYTE c,
                                     vector2f uva, vector2f uvb,
                                     uint32 *col1, uint32 *col2 )
{  
     float MAG = (v - v1[c])/(v2[c]-v1[c]); /* Magnitude */
     
     /* Extract Color Components, Apply Linear Interpolation, then Pack it up */
     BYTE a = SHFMAC( ((*col2 & ALPHA)>>24)-((*col1 & ALPHA)>>24), MAG, (*col1 & ALPHA)>>24 );
     BYTE r = SHFMAC( ((*col2 & RED)>>16)-((*col1 & RED)>>16), MAG, (*col1 & RED)>>16 );
     BYTE g = SHFMAC( ((*col2 & GREEN)>>8)-((*col1 & GREEN)>>8), MAG, (*col1 & GREEN)>>8 );
     BYTE b = SHFMAC( ((*col2 & BLUE)>>0)-((*col1 & BLUE)>>0), MAG, (*col1 & BLUE)>>0 );
     *col1 = ( (a<<24) | (r<<16) | (g<<8) | (b<<0) );
     
     /* Use the SH4's FMAC operation to linear interpolate the U/V data */
     uva[0] = SHFMAC( uvb[0]-uva[0], MAG, uva[0] );
     uva[1] = SHFMAC( uvb[1]-uva[1], MAG, uva[1] );
     
     /* Use the SH4's FMAC operation to linear interpolate the Vertex data */
     v1[0] = SHFMAC( v2[0]-v1[0], MAG, v1[0] );
     v1[1] = SHFMAC( v2[1]-v1[1], MAG, v1[1] );
     v1[2] = SHFMAC( v2[2]-v1[2], MAG, v1[2] );
}
Neoblast wrote:It is indeed impressive, I guess your GL lib is being put to good use right now in some games isn't it :)

Actually I'd like to know the difference of perfmrance you could get with benchmarks on KGL and GL.

I guess your GL is faster, but how much?
Hard to say until I finish all of the clipping algorithm / implementation
The quadmark example compiled against an earlier build increased from 584k verts/second using KGLX up to .96Mil verts/second using my build of GL
viewtopic.php?f=29&t=102181&start=40#p1034323
Post Reply