Need help to optimize my display routine

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
DoomSlayer
DCEmu Newbie
DCEmu Newbie
Posts: 9
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Mon Apr 01, 2019 12:09 pm
Has thanked: 5 times
Been thanked: 1 time

Need help to optimize my display routine

Post by DoomSlayer »

Hello everyone :)

I'm trying to display multiple sprites on the screen, between 200 and 400 per frame, I am using the PVR API, I think it is able to do that, but I am getting some serious slowdown, here is how works my game.

Code: Select all

int main()
{
	initMyGame();
	
	while(1)
	{
		UpdateMyGame();
		ScreenFlip();
	}
}

Code: Select all

void PVR_VblankStart()
{
      pvr_wait_ready();
      pvr_scene_begin();
      pvr_list_begin(PVR_LIST_TR_POLY);
}

void PVR_VblankEnd()
{
      pvr_list_finish();
      pvr_scene_finish();
}

void ScreenFlip()
{
      PVR_VblankStart();

      for(std::size_t i=0; i < spritesPVR_counter; ++i) //here spritesPVR_counter is between 200 and 400
      {
        pvr_poly_hdr_t hdr;
        pvr_poly_cxt_t cxt;
        pvr_vertex_t vert;

        if(spritesPVR[i].tex == 255) //for Rectangle, Line
        {
          pvr_poly_cxt_col(&cxt, PVR_LIST_TR_POLY);
          if(spritesPVR[i].isAddBlend)
          {
            cxt.blend.src = PVR_BLEND_ONE;
            cxt.blend.dst = PVR_BLEND_ONE;
          }
          pvr_poly_compile(&hdr, &cxt);
        }
        else //for texture
        {
          pvr_poly_cxt_txr(&cxt, PVR_LIST_TR_POLY, PVR_TXRFMT_ARGB4444 | PVR_TXRFMT_TWIDDLED |  PVR_TXRFMT_VQ_ENABLE, 1024, 1024, myTexture[spritesPVR[i].tex], PVR_FILTER_BILINEAR);
          if(spritesPVR[i].isAddBlend)
          {
            cxt.blend.src = PVR_BLEND_ONE;
            cxt.blend.dst = PVR_BLEND_ONE;
          }
          pvr_poly_compile(&hdr, &cxt);
        }

        pvr_prim(&hdr, sizeof(hdr));

        vert.argb = PVR_PACK_COLOR(spritesPVR[i].argb.alpha, spritesPVR[i].argb.red, spritesPVR[i].argb.green, spritesPVR[i].argb.blue);
        vert.oargb = 0;
        vert.flags = PVR_CMD_VERTEX;

        vert.x = spritesPVR[i].topleft.x;
        vert.y = spritesPVR[i].topleft.y;
        vert.z = spritesPVR[i].z;
        vert.u = spritesPVR[i].uvtopleft.u;
        vert.v = spritesPVR[i].uvtopleft.v;
        pvr_prim(&vert, sizeof(vert));

        vert.x = spritesPVR[i].topright.x;
        vert.y = spritesPVR[i].topright.y;
        vert.z = spritesPVR[i].z;
        vert.u = spritesPVR[i].uvtopright.u;
        vert.v = spritesPVR[i].uvtopright.v;
        pvr_prim(&vert, sizeof(vert));

        vert.x = spritesPVR[i].bottomleft.x;
        vert.y = spritesPVR[i].bottomleft.y;
        vert.z = spritesPVR[i].z;
        vert.u = spritesPVR[i].uvbottomleft.u;
        vert.v = spritesPVR[i].uvbottomleft.v;
        pvr_prim(&vert, sizeof(vert));

        vert.x = spritesPVR[i].bottomright.x;
        vert.y = spritesPVR[i].bottomright.y;
        vert.z = spritesPVR[i].z;
        vert.u = spritesPVR[i].uvbottomright.u;
        vert.v = spritesPVR[i].uvbottomright.v;
        vert.flags = PVR_CMD_VERTEX_EOL;
        pvr_prim(&vert, sizeof(vert));
      }
      for(std::size_t j = 0; j < stringPVR.size(); ++j)
      {
        fontPos.z = stringPVR[j].z;
        fontPos.x = (float)stringPVR[j].x;
        fontPos.y = (float)stringPVR[j].y;
        plx_fcxt_begin(fontCxtAX);
        plx_fcxt_setsize(fontCxtAX, (float)stringPVR[j].fontSize);
        plx_fcxt_setcolor4f(fontCxtAX, 1.0f, ((float)stringPVR[j].color.r)/255.0f, ((float)stringPVR[j].color.g)/255.0f, ((float)stringPVR[j].color.b)/255.0f);
        plx_fcxt_setpos_pnt(fontCxtAX, &fontPos);
        plx_fcxt_draw(fontCxtAX, stringPVR[j].strblit.c_str());
        plx_fcxt_end(fontCxtAX);
      }
      PVR_VblankEnd();
   
}

Code: Select all

pvr_init_params_t init = {{ PVR_BINSIZE_0, PVR_BINSIZE_0, PVR_BINSIZE_16, PVR_BINSIZE_0, PVR_BINSIZE_0},512*1024, 0, 0};
	pvr_init(&init);
I tested how many MS takes each part

The update of my game is in time, maximum 4ms

On the other hand my Screenflip goes above 17ms often

(I specify that I am looking at milliseconds with NullDC, because I don't have a coder cable :( )

I'm sure I'm missing something in ScreenFlip, too many poly_compile calls?

Too much data transfer?

Too many sprites?

I tested various things already like for exemple reducing poly_compile and poly_cxt_txr
but that doesn't seem to change much

Anyone have an idea of ​​a mistake I made? Or is the dreamcast not powerful enough?

Thank you in advance for your help! :D
TapamN
DC Developer
DC Developer
Posts: 104
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 88 times

Re: Need help to optimize my display routine

Post by TapamN »

The sprite count should not be a problem. Even a Saturn or PSX should be able to handle that.

Timing information from emulators won't be accurate to real hardware. While emulators for older consoles can have accurate timing, everything starting with and after the Saturn and PSX typically treat the CPU as faster than the real thing; accurately emulating all the little slowdowns from cache misses would be way too expensive and slow the emulator down too much. They also generally treat the 3D hardware as infinitely fast.

poly_compile is incredibly slow. You should cache the resulting header and only call poly_compile when something changes.

pvr_prim is also slow if you call it per vertex. It would be better to write the vertices of the quad to an array, then send all 4 vertices at once with one pvr_prim call.

These is are minor things, but if you don't enable specular in the header, there's no need to zero out the oargb values of the vertices. If you turn off gourand shading and switch to flat shading, you can avoid having to set the argb value of the first two vertices.

Something along the lines of this for the inner loop should work better:

Code: Select all

pvr_vertex_t quad[4];

quad[0].flags = PVR_CMD_VERTEX;
quad[1].flags = PVR_CMD_VERTEX;
quad[2].flags = PVR_CMD_VERTEX;
quad[3].flags = PVR_CMD_VERTEX_EOL;

for(std::size_t i=0; i < spritesPVR_counter; ++i) //here spritesPVR_counter is between 200 and 400
      {
     
        pvr_prim(&header_cache[spritesPVR[i].header_index], sizeof(pvr_poly_hdr_t));
	pvr_vertex_t *vert = quad;
	
	int color = PVR_PACK_COLOR(spritesPVR[i].argb.alpha, spritesPVR[i].argb.red, spritesPVR[i].argb.green, spritesPVR[i].argb.blue);
	
        vert->x = spritesPVR[i].topleft.x;
        vert->y = spritesPVR[i].topleft.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvtopleft.u;
        vert->v = spritesPVR[i].uvtopleft.v;
        
        vert++;
        vert->x = spritesPVR[i].topright.x;
        vert->y = spritesPVR[i].topright.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvtopright.u;
        vert->v = spritesPVR[i].uvtopright.v;
        
        vert++;
        vert->x = spritesPVR[i].bottomleft.x;
        vert->y = spritesPVR[i].bottomleft.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvbottomleft.u;
        vert->v = spritesPVR[i].uvbottomleft.v;
        vert->argb = color;
        
        vert++;
        vert->x = spritesPVR[i].bottomright.x;
        vert->y = spritesPVR[i].bottomright.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvbottomright.u;
        vert->v = spritesPVR[i].uvbottomright.v;
        vert->argb = color;
        
        pvr_prim(&quad, sizeof(quad));
}
These users thanked the author TapamN for the post:
DoomSlayer
User avatar
Protofall
DCEmu Freak
DCEmu Freak
Posts: 78
Joined: Sun Jan 14, 2018 8:03 pm
Location: Emu land
Has thanked: 21 times
Been thanked: 18 times
Contact:

Re: Need help to optimize my display routine

Post by Protofall »

And if you want to enable oargb you do this
pvr_poly_cxt_t cxt;
pvr_poly_hdr_t hdr;
pvr_poly_cxt_txr(/* stuff */);
pvr_poly_compile(&hdr, &cxt);
hdr.cmd |= 4;   // Enable oargb
Here's a demo showing oargb for shading (It was made by TapamN!) : https://github.com/Protofall/Homebrew-T ... ests/Shade
These users thanked the author Protofall for the post:
DoomSlayer
Moving Day: A clone of Dr Mario with 8-player support <https://dcemulation.org/phpBB/viewtopic ... 4&t=105389>
A recreation of Minesweeper for the Dreamcast <viewtopic.php?f=34&t=104820>

Twitter <https://twitter.com/ProfessorToffal>
YouTube (Not much there, but there are a few things) <https://www.youtube.com/user/TrueMenfa>
DoomSlayer
DCEmu Newbie
DCEmu Newbie
Posts: 9
Joined: Mon Apr 01, 2019 12:09 pm
Has thanked: 5 times
Been thanked: 1 time

Re: Need help to optimize my display routine

Post by DoomSlayer »

Thank you all for your answers !

I modded my Serial port to USB, it will allow me to have the real milliseconds values ​​of each frame, I test all this tonight

Thank you for the oargb it could be useful for a future project! In this one I don't use it
These users thanked the author DoomSlayer for the post:
Protofall
DoomSlayer
DCEmu Newbie
DCEmu Newbie
Posts: 9
Joined: Mon Apr 01, 2019 12:09 pm
Has thanked: 5 times
Been thanked: 1 time

Re: Need help to optimize my display routine

Post by DoomSlayer »

Hi everyone,

I applied the recommendations of TapamN,
in my case it hasn't changed much, I still have my FPS drops, but now that I can test on my dreamcast I can bring new informations :

There are some weird things, let me explain

To get the execution time, I proceed like this:

Code: Select all

int main()
{
	uint64 cputime = 0;
	uint64 cputimeEnd = 0;
	
	uint64 transfertDatatime = 0;
	uint64 transfertDatatimeEnd = 0;

	initMyGame();
	
	while(1)
	{
		cputime = timer_ms_gettime64();
		UpdateMyGame();
		cputimeEnd = timer_ms_gettime64();
		printf("CPU_MS : %llu\n", cputimeEnd - cputime);
		
		pvr_wait_ready();
    		pvr_scene_begin();
    		pvr_list_begin(PVR_LIST_TR_POLY);
    		
    		transfertDatatime = timer_ms_gettime64();
		ScreenFlip();
		transfertDatatimeEnd = timer_ms_gettime64();
   		printf("Transfer Time : %llu\n", transfertDatatimeEnd - transfertDatatime);
		
		pvr_list_finish();
    		pvr_scene_finish();
	}
}
My ScreenFlip Func:

Code: Select all

for (std::size_t i = 0; i < spritesPVR_counter; ++i) {
        int id_blend = 0;

        if (spritesPVR[i].tex == 254) {
	    spritesPVR[i].tex = 54;
	    if (spritesPVR[i].isAddBlend) {
                id_blend = 1;
            }

            if(hdr_cache[spritesPVR[i].tex][id_blend].cmd == 0 && hdr_cache[spritesPVR[i].tex][id_blend].mode1 == 0 && hdr_cache[spritesPVR[i].tex][id_blend].mode2 == 0 && hdr_cache[spritesPVR[i].tex][id_blend].mode3 == 0)
            {
            	continue;
            }
        } 
        else
        {
            if (spritesPVR[i].isAddBlend) {
                id_blend = 1;
            }
            if(hdr_cache[spritesPVR[i].tex][id_blend].cmd == 0 && hdr_cache[spritesPVR[i].tex][id_blend].mode1 == 0 && hdr_cache[spritesPVR[i].tex][id_blend].mode2 == 0 && hdr_cache[spritesPVR[i].tex][id_blend].mode3 == 0)
            {
            	continue;
            }
        }

        pvr_prim(&hdr_cache[spritesPVR[i].tex][id_blend], sizeof(pvr_poly_hdr_t));
	pvr_vertex_t *vert = quad;
		
	int color = PVR_PACK_COLOR(spritesPVR[i].argb.alpha, spritesPVR[i].argb.red, spritesPVR[i].argb.green, spritesPVR[i].argb.blue);
	
        vert->x = spritesPVR[i].topleft.x;
        vert->y = spritesPVR[i].topleft.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvtopleft.u;
        vert->v = spritesPVR[i].uvtopleft.v;
        vert->argb = color;
        
        vert++;
        vert->x = spritesPVR[i].topright.x;
        vert->y = spritesPVR[i].topright.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvtopright.u;
        vert->v = spritesPVR[i].uvtopright.v;
        vert->argb = color;
        
        vert++;
        vert->x = spritesPVR[i].bottomleft.x;
        vert->y = spritesPVR[i].bottomleft.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvbottomleft.u;
        vert->v = spritesPVR[i].uvbottomleft.v;
        vert->argb = color;
        
        vert++;
        vert->x = spritesPVR[i].bottomright.x;
        vert->y = spritesPVR[i].bottomright.y;
        vert->z = spritesPVR[i].z;
        vert->u = spritesPVR[i].uvbottomright.u;
        vert->v = spritesPVR[i].uvbottomright.v;
        vert->argb = color;
        
        pvr_prim(&quad, sizeof(quad));
    }
    for (std::size_t j = 0; j < stringPVR.size(); ++j) {
    	//continue;
        fontPos.z = stringPVR[j].z;
        fontPos.x = (float) stringPVR[j].x;
        fontPos.y = (float) stringPVR[j].y;
        plx_fcxt_begin(fontCxtAX);
        plx_fcxt_setsize(fontCxtAX, (float) stringPVR[j].fontSize);
        plx_fcxt_setcolor4f(fontCxtAX, 1.0f, ((float) stringPVR[j].color.r) / 255.0f, ((float) stringPVR[j].color.g) / 255.0f, ((float) stringPVR[j].color.b) / 255.0f);
        plx_fcxt_setpos_pnt(fontCxtAX, & fontPos);
        plx_fcxt_draw(fontCxtAX, stringPVR[j].strblit.c_str());
        plx_fcxt_end(fontCxtAX);
    }
The problem here is that at the place where I have the biggest FPS drop, my CPU_MS displays max 4ms
and my Transfer Time displays max 2ms

I am a little lost, I admit that I do not understand.

the problem is with pvr_wait_ready?
It should not be used that way maybe

or else I don't use timer_ms_gettime64 (); correctly and times are wrong?
TapamN
DC Developer
DC Developer
Posts: 104
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 88 times

Re: Need help to optimize my display routine

Post by TapamN »

Your pvr_wait_ready and timer_ms_gettime64 usage looks fine to me.

Do you have any threads for stuff like sound that might be taking up CPU time?

You don't have a lot of printf's going on during gameplay, do you? In the code example of how you added the timing measurements, you're printing every frame. Is that just a simplified version of the code to show how it works, or are you really printing every frame? The console is slow and per frame printf's take a lot of CPU time.

What timing results do you get for this code? It measures the time of everything all together except for pvr_wait_ready.

Code: Select all

int main()
{
	uint64 frametimeStart = 0;
	uint64 lastframetime = 0;

	initMyGame();
	
	while(1)
	{
		UpdateMyGame();
		
		lastframetime = timer_ms_gettime64() - frametimeStart;
		
		pvr_wait_ready();
		
		frametimeStart = timer_ms_gettime64();
		
		if (some_button_combination_pressed) {
			printf("Frame Time: %llu\n", lastframetime);
			fflush(stdout);
		}
		
    		pvr_scene_begin();
    		pvr_list_begin(PVR_LIST_TR_POLY);
    		
		ScreenFlip();
		
		pvr_list_finish();
    		pvr_scene_finish();
	}
}
If the frame time is less than 16.6 ms and you're still getting slow downs, my next guess at this point would be that maybe you're GPU limited. You can get the GPU render time with this:

Code: Select all

pvr_stats_t stats;
pvr_get_stats(&stats);
printf("Render time: %i ms\n", stats.rnd_last_time);
If you get a time over 16 ms, then it's too much for the GPU, and you'll have to find a way to reduce GPU load. How you would do this depends on what you're drawing.

How big are the sprites? The DC doesn't have a lot of raw fillrate, and handles opaque polygons a lot better than transparent. If you have any large non-transparent sprites as background objects, they should be sent through the opaque list.

If you have sprites that have 1-bit alpha and aren't semitransparent (like a Genesis or NES sprite, each pixel is just fully opaque or fully transparent), the GPU can draw them more efficiently if you send them through the punchthrough list.

Are there parts of the screen with a lot of overlapping sprites? It takes time for the GPU to sort the polygons per pixel. You can try turning off the hardware transparency sorting to see if that helps. To do this, change your pvr_init call to set the a sort disable parameter to 1.

Code: Select all

pvr_init_params_t init = {
	{ PVR_BINSIZE_0, PVR_BINSIZE_0, PVR_BINSIZE_16, PVR_BINSIZE_0, PVR_BINSIZE_0},
	512*1024,
	0, //dma
	0, //fsaa
	1 //auto sort disable
};
pvr_init(&init);
It won't look correct without adding CPU sorting, but you can still get a good measurement to find out if sort overhead is a big deal and if it's worth doing sorting on the CPU. If you test this, set the depth mode of the sprites to PVR_DEPTHCMP_ALWAYS or turn off depth writes, so everything always gets drawn. Otherwise, the GPU will skip drawing things that happen to fail the depth test, and you won't get an fair comparison.
DoomSlayer
DCEmu Newbie
DCEmu Newbie
Posts: 9
Joined: Mon Apr 01, 2019 12:09 pm
Has thanked: 5 times
Been thanked: 1 time

Re: Need help to optimize my display routine

Post by DoomSlayer »

Hi, thank you very much for all of this information, indeed the problem was that I displayed too many POLY_TRs, so I was able to optimize my render!
DoomSlayer
DCEmu Newbie
DCEmu Newbie
Posts: 9
Joined: Mon Apr 01, 2019 12:09 pm
Has thanked: 5 times
Been thanked: 1 time

Re: Need help to optimize my display routine

Post by DoomSlayer »

Hello everyone, I am redoing some tests on the DC, I have browsed a lot of forum threads, and I have experienced direct rendering and rendering with DMA.

I think I understood most of the notions about rendering, but there are still a few points where I'm not sure I understand.

The PVR_BINSIZE_XX, I don't really understand what this entails, what is it possible to achieve by changing the values ​​of these?

Then I have a hard time understanding how Translucents work, I mean, the actual display capabilities of DC with Translucents.

Let me explain, I'm doing a rather basic test, I have OP, PT and TR textures. With this one I simulate a game scene.

First I draw the OPs, to make my backgrounds, then I draw my PTs which are the non-fullscreen backgrounds, to make a multi-layered backgrounds, and my sprites.

Then I draw my TRs, there are few of them, just a few sprites to simulate a HUD at the top of the screen, but also very rarely a background that fades in transparency.

Here is my pseudo code, for my display routine:

Code: Select all

pvr_wait_ready();
pvr_scene_begin();


pvr_list_begin(PVR_LIST_OP_POLY);


for (int i = 0; i < op_counter; i++)
{
    pvr_poly_cxt_txr(op_context[i],PVR_LIST_OP_POLY, PVR_TXRFMT_RGB565 | PVR_TXRFMT_TWIDDLED | PVR_TXRFMT_VQ_ENABLE, op_width[i], op_height[i], op_tex[i], PVR_FILTER_NONE);
    pvr_poly_compile(&op_hdr[i], &op_context[i]);
    pvr_dr_init(state);
    pvr_prim(&op_hdr[i], sizeof(pvr_poly_hdr_t));

    --Draws Multiples quads like this:
        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX_EOL
        pvr_dr_commit(vert);
}

pvr_list_finish();

pvr_list_begin(PVR_LIST_PT_POLY);


for (int i = 0; i < pt_counter; i++)
{
    pvr_poly_cxt_txr(pt_context[i],PVR_LIST_PT_POLY, PVR_TXRFMT_ARGB1555 | PVR_TXRFMT_TWIDDLED | PVR_TXRFMT_VQ_ENABLE, pt_width[i], pt_height[i], pt_tex[i], PVR_FILTER_NONE);
    pvr_poly_compile(&pt_hdr[i], &pt_context[i]);
    pt_hdr[i].cmd |= 4;
    pvr_dr_init(state);
    pvr_prim(&pt_hdr[i], sizeof(pvr_poly_hdr_t));

    --Draws Multiples quads like this:
        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX_EOL
        pvr_dr_commit(vert);
}

pvr_list_finish();

pvr_list_begin(PVR_LIST_TR_POLY);


for (int i = 0; i < tr_counter; i++)
{
    pvr_poly_cxt_txr(tr_context[i],PVR_LIST_TR_POLY, PVR_TXRFMT_ARGB4444 | PVR_TXRFMT_TWIDDLED | PVR_TXRFMT_VQ_ENABLE, tr_width[i], tr_height[i], tr_tex[i], PVR_FILTER_NONE);
    pvr_poly_compile(&tr_hdr[i], &tr_context[i]);
    tr_hdr[i].cmd |= 4;
    pvr_dr_init(state);
    pvr_prim(&tr_hdr[i], sizeof(pvr_poly_hdr_t));

    --Draws Multiples quads like this:
        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX
        pvr_dr_commit(vert);

        vert = pvr_dr_target(state);
        PVR_CMD_VERTEX_EOL
        pvr_dr_commit(vert);
}

pvr_list_finish();
pvr_scene_finish();
PVR Init :

Code: Select all

pvr_init_params_t params = {
        /* Enable opaque, translucent, and punchthru polygons with size 16 */
        { PVR_BINSIZE_16, PVR_BINSIZE_0, PVR_BINSIZE_16, PVR_BINSIZE_0, PVR_BINSIZE_16 },

        /* Vertex buffer size 256K */
        512 * 1024,

        /* No DMA */
        0,

        /* No FSAA */
        0,

        /* Translucent Autosort disabled. */
        0
    };
During the scene, between 1000 and 2500 PT vertices are sent, for the OP between 4 and 50, and the TRs, during the fade moments, 4 TR vertices, and during the basic game scene the HUD requests around 80 TR vertices. For backgrounds that appear with tranparency transition, around 200 TRs.

Tiny TRs sprites, like the HUD sprites, do not cause slowdown during the scene.

On the other hand, larger sprites, like the one for the fade for example (a black sprite of 320x240, stored in a texture of 512x512 TR), slow down the game enormously. The same slowdown is observable with smaller sprites with effects of transparency like for example 200x100)

What I'm wondering is: is it normal that the Dreamcast has trouble displaying few TRs vertices??

I specify that without the TRs, I am at constant 60FPS, as soon as a TR a little big appears then I go down to at least 30fps, or even below.

Other details, the specular is activated for TR and PT, to be able to add the oargb value in the vertices.

I tried to disable the autosort, it doesn't change anything, the slowdowns are still present in the same way.

Even if it didn't seem like the solution, I tried to make a rendering combining SQ and DMA by following this tutorial:
https://tinyurl.com/sqdmadc

But no change too, the same slowdown, which is not too surprising, the problem did not seem to come from the transfer speed of the vertices to the pvr.

If anyone can enlighten me, thank you in advance. :bow:
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Need help to optimize my display routine

Post by BlueCrab »

First up... a value of 0 in the last thing in the pvr_init_params_t structure actually enables autosort, not disables it. It is kinda backwards, but because of historical code it had to be done that way, basically (since KOS' behavior was always to enable it previously before that parameter was added and all).

Second, large transparencies will cause the hardware to have to work a lot harder to do rendering. It doesn't surprise me that a few very large transparent polygons would potentially have a drastic influence on the rendering speed. Sadly, it's somewhat inherent in the way that tile-based deferred rendering works...

While this article is about much newer GPUs than what's in the DC, I'm sure, the idea is still the same and they briefly explain why transparencies slow things down a lot: https://blog.imaginationtech.com/the-dr ... -in-rogue/ .
These users thanked the author BlueCrab for the post:
DoomSlayer
DoomSlayer
DCEmu Newbie
DCEmu Newbie
Posts: 9
Joined: Mon Apr 01, 2019 12:09 pm
Has thanked: 5 times
Been thanked: 1 time

Re: Need help to optimize my display routine

Post by DoomSlayer »

BlueCrab wrote: Sat Jul 09, 2022 10:38 pm First up... a value of 0 in the last thing in the pvr_init_params_t structure actually enables autosort, not disables it. It is kinda backwards, but because of historical code it had to be done that way, basically (since KOS' behavior was always to enable it previously before that parameter was added and all).
Oh yes I knew that, I just re-enabled the autosort after seeing that it didn't improve the framerate
Second, large transparencies will cause the hardware to have to work a lot harder to do rendering. It doesn't surprise me that a few very large transparent polygons would potentially have a drastic influence on the rendering speed. Sadly, it's somewhat inherent in the way that tile-based deferred rendering works...

While this article is about much newer GPUs than what's in the DC, I'm sure, the idea is still the same and they briefly explain why transparencies slow things down a lot: https://blog.imaginationtech.com/the-dr ... -in-rogue/ .
Ah thank you very much for the link, it is very interesting on how the pvr works, I understand better why large TR sprites reduce the framerate so much. Thanks for taking the time to explain to me :)
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Need help to optimize my display routine

Post by BlueCrab »

DoomSlayer wrote: Sun Jul 10, 2022 3:41 am
BlueCrab wrote: Sat Jul 09, 2022 10:38 pm First up... a value of 0 in the last thing in the pvr_init_params_t structure actually enables autosort, not disables it. It is kinda backwards, but because of historical code it had to be done that way, basically (since KOS' behavior was always to enable it previously before that parameter was added and all).
Oh yes I knew that, I just re-enabled the autosort after seeing that it didn't improve the framerate
Ok... Just wanted to be sure since it is kinda backwards of what you'd expect (and since the comment said that it was disbled).
Second, large transparencies will cause the hardware to have to work a lot harder to do rendering. It doesn't surprise me that a few very large transparent polygons would potentially have a drastic influence on the rendering speed. Sadly, it's somewhat inherent in the way that tile-based deferred rendering works...

While this article is about much newer GPUs than what's in the DC, I'm sure, the idea is still the same and they briefly explain why transparencies slow things down a lot: https://blog.imaginationtech.com/the-dr ... -in-rogue/ .
Ah thank you very much for the link, it is very interesting on how the pvr works, I understand better why large TR sprites reduce the framerate so much. Thanks for taking the time to explain to me :)
I'm happy to have been able to at least help understand what was going on. :)
These users thanked the author BlueCrab for the post:
DoomSlayer
Post Reply