pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by Newbie »

Hi everybody,

I am using some texture loading inside my code making move of graphic datas from main memory (SH4) to video memory (PVR). So i am using the "pvr_mem_malloc" and "pvr_txr_load". But as i move quite large amount of datas, it takes times and i want to try to reduce it. So i dig in KOS and i see that the function "pvr_txr_load" is using stored queues with "sq_cpy" function. Is there a way to speed up something ? I heard about DMA transfer too ...

Thanks.

Code: Select all


/* Load raw texture data from an SH-4 buffer into PVR RAM */
void pvr_txr_load(void * src, pvr_ptr_t dst, uint32 count) {
    if(count % 4)
        count = (count & 0xfffffffc) + 4;

    sq_cpy((uint32 *)dst, (uint32 *)src, count);
}

Code: Select all


/* copies n bytes from src to dest, dest must be 32-byte aligned */
void * sq_cpy(void *dest, void *src, int n) {
    unsigned int *d = (unsigned int *)(void *)
                      (0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
    unsigned int *s = src;

    /* Set store queue memory area as desired */
    QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
    QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;

    /* fill/write queues as many times necessary */
    n >>= 5;

    while(n--) {
        asm("pref @%0" : : "r"(s + 8));  /* prefetch 32 bytes for next loop */
        d[0] = *(s++);
        d[1] = *(s++);
        d[2] = *(s++);
        d[3] = *(s++);
        d[4] = *(s++);
        d[5] = *(s++);
        d[6] = *(s++);
        d[7] = *(s++);
        asm("pref @%0" : : "r"(d));
        d += 8;
    }

    /* Wait for both store queues to complete */
    d = (unsigned int *)0xe0000000;
    d[0] = d[8] = 0;

    return dest;
}

User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by bogglez »

Check out the alternative sq_cpy in the libgl in kos-ports by ph3nom. He claimed some speed improvements over sq_cpy in the past.
If I remember correctly DMA will not be faster, but it will happen in the background, so you could try to perform some other tasks while the DMA copy is unfinished.

Also, can you upload an example texture? Depending on the texture format (palette, compression) and resolution, the texture will be bigger and take longer to copy.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
maslevin
DC Developer
DC Developer
Posts: 13
Joined: Thu Apr 02, 2015 11:26 pm
Has thanked: 15 times
Been thanked: 6 times

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by maslevin »

What sort of image loading are you doing?

If you are procedurally converting or generating an image that has to go to VRAM, you should consider rendering directly to the store queues instead of rendering your images to an intermediate buffer which is then copied via the store queues. This allows you to render to one store queue, while the other is transferring your data to its destination.

However, if you're just loading lots of static images, I'd recommend the DMA approach.
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by Newbie »

Well, i try to load a bunch of squared VQ 16 bit 4444 textures files. Those datas are stored firrst in main ram and moved to vram when i need them (by allocating and moving).

The DMA thing could not solve my problem because it's only a time consumming process problem.

The alternative sq_cpy in the libgl could be interresting. I have the kos ports archive of KOS 2 but i did not see any reference so "sq_cpy" in the source code in libgl directory.

Could somebody help me finding the source code of these alternative sq_cpy ?

ph3nom himself perhaps ?

Thanks.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by PH3NOM »

The code in question is on gl-pvr.c
Spoiler!

Code: Select all

#define TA_SQ_ADDR (unsigned int *)(void *) \
    (0xe0000000 | (((unsigned long)0x10000000) & 0x03ffffe0))

/* Custom version of sq_cpy from KOS for copying vertex data to the PVR */
static inline void pvr_list_submit(void *src, int n) {
    GLuint *d = TA_SQ_ADDR;
    GLuint *s = src;

    /* fill/write queues as many times necessary */
    while(n--) {
        asm("pref @%0" : : "r"(s + 8));  /* prefetch 32 bytes for next loop */
        d[0] = *(s++);
        d[1] = *(s++);
        d[2] = *(s++);
        d[3] = *(s++);
        d[4] = *(s++);
        d[5] = *(s++);
        d[6] = *(s++);
        d[7] = *(s++);
        asm("pref @%0" : : "r"(d));
        d += 8;
    }

    /* Wait for both store queues to complete */
    d = (GLuint *)0xe0000000;
    d[0] = d[8] = 0;
}
That function is used for submitting Vertex Data directly to the PVR's Tile Accelerator via the Store Queues.
That function is not used for submitting textures to the PVR's Texture Memory.
On gl-texture.c, you will see that I use sq_cpy(...) for uploading textures to the PVR.

What is it you are trying to do, exactly?
For normal use, I do not see any problem with the speed of uploading textures using sq_cpy.

Are you trying to move the textures over every frame? Because you only need to upload the texures once, then leave them in PVR memory as long as they are being used in your scene.

If you HAVE to move the textures over every frame ( some sort of texture streaming ), it would be best to do as maslevin suggests and use a non-blocking dma approach where you can be loading the textures for the Next Frame in the background while you are currently handling the current frame.
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by Newbie »

Well, i have just in a place of my code a request for a bunch of textures stored in RAM to be moved in VRAM then to be drawn immediately. The move takes "a little bit time" (just a very little too much).

This is why i ask myself (and you) if it could be speed up.

As i use KOS wich use "pvr_mem_malloc" then "pvr_txr_load" then "sq_cpy" to push datas on VRAM, i ask myself (and you) if "sq_cpy" could be a little bit speed up to reduce this little time gap as i imagine that all (~90%) time spent in loading a texture is in "sq_cpy".

If it is not possible, i could neither use the DMA thing because i must recode a lot of things to manage while the background sequential textures loading the screen drawing ...

Thank to all (and ph3nom to respond me)
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by BlueCrab »

Do you need to upload textures each frame like you're doing? Is there a good reason to be doing that?

Unless you have some massive number of textures where the ones that are used change every frame, you're doing a lot more work than you need to do. As PH3NOM said, you should just upload your textures once and use them over and over again, unless there's some really good reason why you can't.
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by Newbie »

Hi,

I never said i "upload textures each frame". It is only at a place of my code and once but it takes a little bit time and i want to reduce this time.
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by bogglez »

You never posted code or uploaded such a texture as I asked you and made people guess, so don't be surprised about vague answers please. I suggest that you provide better information in the future
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by BlueCrab »

Newbie wrote:Well, i have just in a place of my code a request for a bunch of textures stored in RAM to be moved in VRAM then to be drawn immediately. The move takes "a little bit time" (just a very little too much).
That sounds like you're uploading textures right when they're needed (i.e, on a frame-by-frame basis). Hence why I (and probably others) made that assumption.

Regardless, if you know a texture is going to be needed soon, why not spin up the transfer in the background? This could be done either by way of a non-blocking DMA transfer, or a separate thread that uses the store queues. That way, you're not waiting when you need it.

If you can't tell in advance that a texture is needed, you're pretty much stuck waiting. Bulk memory transfers across separate busses take time and there's nothing that can really be done to improve the time spent all that much, unfortunately.
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by Newbie »

If you can't tell in advance that a texture is needed, you're pretty much stuck waiting. Bulk memory transfers across separate busses take time and there's nothing that can really be done to improve the time spent all that much, unfortunately.
Ok, i'll try to improve my code.

Thanks everybody.
tonma
DCEmu Freak
DCEmu Freak
Posts: 82
Joined: Thu Mar 10, 2016 7:14 am
Has thanked: 0
Been thanked: 1 time

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by tonma »

I have several textures or spritesheet who can not enter into the vram simultaneously.
Png Texture for the background level and spritesheet for 2 players and ennemies.

I wish load some ennemies spritesheet in ram (16Mo) and copy them in Vram for showing new ennemy.

The function "pvr_txr_load" load texture in vram (from pvr functions). But how can I load png in ram directly ? I doesn't find functions to do that.

Can we know how much ram we are using ? I know we can read the image.size.byte when loading but maybe there is a better solution.
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by bogglez »

You shouldn't use pngs since they will be uncompressed and take a lot of space.
This tutorial should help you
http://dcemulation.org/?title=KMG_Textures
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
tonma
DCEmu Freak
DCEmu Freak
Posts: 82
Joined: Thu Mar 10, 2016 7:14 am
Has thanked: 0
Been thanked: 1 time

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by tonma »

Yes, I have made some tests and the output it's not quite as good as tga/png format.
I will post some pictures to show my result.
Some developer think kmg is for 3d object texture and not for sprite. Maybe I've made something wrong, again. :mrgreen:
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by bogglez »

That's not possible. You cannot compare the size of the png with kmg directly. Your png loading function decompresses the image. It will be much bigger than the png size
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
tonma
DCEmu Freak
DCEmu Freak
Posts: 82
Joined: Thu Mar 10, 2016 7:14 am
Has thanked: 0
Been thanked: 1 time

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by tonma »

Sorry, I'm talking about the quality of the picture not the size. The size is the good point of the kmg picture. The problem come from the rendering on the screen.

I use the code you upload on the compression/size post. For having the same result for every test and the compress vqenc : $(KOS_BASE)/utils/vqenc/vqenc -v -t -q -k

On this picture you can see on left the png original and on right the kmg converted file.
I have put a zoom to better see the difference.
Image

And the original png 8-bit :
Image

Sorry for my bad english
Pckid
DCEmu Newbie
DCEmu Newbie
Posts: 1
Joined: Fri Mar 18, 2016 8:47 am
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by Pckid »

Hello Tomna, bogglez

I understand the problem of tomna, and maybe picture in 5bit can be solve the problem ?

Or What is the best Way to get like some 80 sprites in the dreamcast, to create a good old game 2D neo retro ?

thanks
tonma
DCEmu Freak
DCEmu Freak
Posts: 82
Joined: Thu Mar 10, 2016 7:14 am
Has thanked: 0
Been thanked: 1 time

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by tonma »

I have make some new tests with kmg.

With 4-bit, 8-bit, 24-bit png with and without transparency. And after with another version of vqenc to verify my compiled version. And optimize with optipng.

I always have the bad compression quality on screen like you see on the picture. Left Png, Right Kmg
Image
Image

If someone want to try, I put the link to the original png file.
Image

If I can make a good quality kmg file, I can begin my game.
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by bogglez »

Hey,

I'm not using vqenc myself so I seem to have forgotten that it does not not support paletted textures. The problem with VQ is that I think it creates blocks of pixels (2x2?) and interpolates, if I remember correctly.

I'll try to write a tutorial today on using tvspelfreak's encoder to create paletted textures in a spritesheet. But maybe I will finish tomorrow
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
tonma
DCEmu Freak
DCEmu Freak
Posts: 82
Joined: Thu Mar 10, 2016 7:14 am
Has thanked: 0
Been thanked: 1 time

Re: pvr_mem_malloc / pvr_txr_load / sq_cpy and time ...

Post by tonma »

Thanks, you save my life

I'm very patient :wink:
Post Reply