PVR YUV->UYVY Conversion

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

PVR YUV->UYVY Conversion

Post by PH3NOM » Tue Jul 05, 2011 10:49 pm

Converting images from one colorspace to another can take a considerable amount of CPU time.

The Dreamcast's PVR video hardware features native support for several nice colorspaces, but not YUV420.
YUV420 is the native colorspace for almost all video codecs, and some image formats.

What the Dreamcast's PVR does feature, is a 'pipeline' for converting YUV(420) data to UYVY(YUV422), effectively offloading colorspace conversion from the CPU, onto the GPU.
I guess the UYVY data gets converted by the PVR Core to RGB565 before it is drawn to the screen, clamped in the range of 255, but that is beyond our concern here. All we need to know is that UYVY is a natively supported by the PVR.

YUV->UYVY conversion is done by transferring macroblocks of YUV data through the Tile Accelerator, via DMA transfers.

Before sending YUV data through DMA, some registers need to be set on the PVR:

Code: Select all

    /* Allocate (UYVY) texture space for a 512x256 texture */
    pvr_ptr_t * pvr_decoded_frame[0];
    pvr_decoded_frame[0] = pvr_mem_malloc(512 *256 * 2);        

    /* Set the PVR YUV converter destination address */    
    PVR_SET( PVR_YUV_ADDR,  pvr_decoded_frame[0] ); 

    /* Set the YUV texture data configuration */
    /* Byte 4: 0x00 - 0x00 = sending YUV420 
                       0x01 = sending YUV422
        Byte 3: 0x00 - 0x00 = render all macroblocks for a single frame into a single polygon
                       0x01 = render each macroblock into its own polygon 
        Byte 2: 0x0F - (Texture height/16)-1
        Byte 1: 0x1F - (Texture width/16)-1
    */
    PVR_SET( PVR_YUV_CFG_1, 0x00000F1F ); 
    PVR_GET( PVR_YUV_CFG_1 );

When sending the YUV data via DMA, a specific address is used:

Code: Select all

#define PVR_DMA_VRAM64  0   /*< Transfer to VRAM in interleaved mode */
#define PVR_DMA_VRAM32  1  /*< Transfer to VRAM in linear mode */
#define PVR_DMA_TA	   2   /*< Transfer to the tile accelerator */
#define PVR_DMA_YUV	  3   /*< Transfer to the yuv converter */

	/* Send the data to the right place */
	if (type == PVR_DMA_TA)
		dest_addr = (((unsigned long)dest) & 0xFFFFFF) | 0x10000000;
   else if (type == PVR_DMA_YUV)
	   dest_addr = (((unsigned long)dest) & 0xFFFFFF) | 0x10800000;                
	else
		dest_addr = (((unsigned long)dest) & 0xFFFFFF) | 0x11000000;

YUV420 data must be sent to the PVR YUV converter in this order:
16x16 pixels Udata ( 64 bytes )
16x16 pixels Vdata ( 64 bytes )
16x16 pixels Ydata ( 256 bytes )

And, I think this is my problem, sending the data correctly. I have the converter working, but the displayed image is not correct.

Code: Select all

 
#define PVR_YUV_STAT		0x0150		/* The number of YUV macroblocks converted */

void pvr_yuv_transfer( unsigned char * image, pvr_ptr_t * uyvy_dst, int texWidth, int texHeight  ) {
 
    uint16 * udst = image->u;
    uint16 * vdst = image->v;
    uint16 * ydst = image->y;
  
    int mblock, mblocks;
    mblock = 0, mblocks = (texWidth/16)*(texHeight/16);
    while( mblock < mblocks ) {
                               
         dcache_flush_range((unsigned)udst,64);
         while (!pvr_dma_ready());
         pvr_dma_transfer( (void*)udst, (uint32)uyvy_dst, 64, 3, 0, NULL, NULL);
         udst+=64;

         dcache_flush_range((unsigned)vdst,64);         
         while (!pvr_dma_ready());           
         pvr_dma_transfer( (void*)vdst, (uint32)uyvy_dst, 64, 3, 0, NULL, NULL); 
         vdst+=64;
         
          dcache_flush_range((unsigned)ydst,256);                 
          while (!pvr_dma_ready());
          pvr_dma_transfer( (void*)ydst, (uint32)uyvy_dst, 256, 3, 0, NULL, NULL); 
          ydst+=256;

         mblock++;
     }
     printf("PVR: YUV Macroblocks Converted: %i\n", PVR_GET(PVR_YUV_STAT) );

}
I am sure there are some things im doing wrong with the function pvr_yuv_transfer()
As always, any help is appreciated!
User avatar
Neoblast
DC Developer
DC Developer
Posts: 312
Joined: Sat Dec 01, 2007 8:51 am
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by Neoblast » Wed Jul 06, 2011 8:30 am

Post an example image, maybe that would help to get where the error is being made...
Well 2 images, one with the correct, and the wrong output one.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by PH3NOM » Wed Jul 06, 2011 1:14 pm

Looking closer at some other xvid functions, I have made some changes with the function pvr_yuv_transfer()

Single macro-blocks are now accurately converting!

The source image is a 16x16 texture, encoded into an XviD(YUV420):
Image

Using the PVR YUV->UYVY conversion, here is the displayed image:
Image
Chilly Willy
DC Developer
DC Developer
Posts: 414
Joined: Thu Aug 20, 2009 11:00 am
Has liked: 0
Been liked: 2 times

Re: PVR YUV->UYVY Conversion

Post by Chilly Willy » Thu Jul 07, 2011 5:15 pm

They're both solid red... looks right to me! :lol:
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by RyoDC » Sat Jul 09, 2011 11:26 am

What address Dreamcast uses to directly access a frame of the the video memory?
How do I try to build a Dreamcast toolchain:
Image
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5410
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has liked: 2 times
Been liked: 16 times
Contact:

Re: PVR YUV->UYVY Conversion

Post by BlueCrab » Sat Jul 09, 2011 1:09 pm

RyoDC wrote:What address Dreamcast uses to directly access a frame of the the video memory?
Thats a bit unrelated to the topic at hand here, but KOS provides a nice set of variables to access it. Namely vram_s (for 16-bit access) and vram_l (for 32-bit access), both of which are in <dc/video.h>. As the framebuffers can be swapped around at runtime, use those variables to do whatever access you need to do.

That said, framebuffer writes can be quite slow depending on how you do them. In most situations its a much better idea to use the PVR to do what you need to do graphics wise.
Chilly Willy
DC Developer
DC Developer
Posts: 414
Joined: Thu Aug 20, 2009 11:00 am
Has liked: 0
Been liked: 2 times

Re: PVR YUV->UYVY Conversion

Post by Chilly Willy » Sat Jul 09, 2011 9:50 pm

RyoDC wrote:What address Dreamcast uses to directly access a frame of the the video memory?
Let's do a quick overview using code from my port of Doom.

Code: Select all

    base_address = (uint32 *)((uint32)vram_l + vramoffset);
This sets where we start drawing. As BlueCrab mentioned, vram_l has the base address, and vramoffset is a variable I allow the GUI to set by the user for centering on the TV (for TV modes, not VGA modes).

Code: Select all

    //start_timer ();
#if 0
    for (j=top; j<height; j++)
        for (i=left; i<width; i++)
            base_address[i + j*lineWidth] = palette[screens[0][i + j*SCREENWIDTH]];
This is the most simple way to draw to the screen directly - I fetch a byte for the source pixel, look up the color, and store that color into the screen. This is really slow, mostly due to the byte reading. Try not to read bytes for operations like this - read longs, as below.

Code: Select all

#elif 0
    for (j=top; j<height; j++)
        for (i=left; i<width; i+=4)
        {
            uint32 fp = *(uint32 *)&screens[0][i + j*SCREENWIDTH];
            base_address[i + j*lineWidth] = palette[fp&0xff];
            base_address[i + 1 + j*lineWidth] = palette[(fp>>8)&0xff];
            base_address[i + 2 + j*lineWidth] = palette[(fp>>16)&0xff];
            base_address[i + 3 + j*lineWidth] = palette[fp>>24];
        }
This is the first optimization. Read a long from the source, then do four stores to the screen. Much better throughput this way.

The next way uses the store queue to write the screen. This is the fastest way short of using the vdp to write the screen.

Code: Select all

#else
    {
        uint32 dest = (uint32)&base_address[0];
        /* Set store queue memory area as desired */
        QACR0 = (((dest)>>26)<<2)&0x1c;
        QACR1 = (((dest)>>26)<<2)&0x1c;
    }
This sets the upper part of the address the store queue will be using. Note that the store queue requires more strict alignment than using the cpu to write the screen. It's a cache line (32 bytes, IIRC).

Code: Select all

    for (j=top; j<height; j++)
    {
        uint32 *s = (uint32 *)&screens[0][j*SCREENWIDTH];
        uint32 *d = (uint32 *)(0xe0000000 | (((uint32)&base_address[j*lineWidth]) & 0x03ffffe0));
        for (i=left; i<width; i+=8)
        {
            // copy to vram using store queues
            uint32 fp;
// screens[] is uncached, so prefetch doesn't make a difference
//            if ((i&7) == 0)
//                asm("pref @%0" : : "r" (s + 8)); /* prefetch 32 bytes for next loop */
            fp = *s++;
            d[0] = palette[fp&0xff];
            d[1] = palette[(fp>>8)&0xff];
            d[2] = palette[(fp>>16)&0xff];
            d[3] = palette[fp>>24];
            fp = *s++;
            d[4] = palette[fp&0xff];
            d[5] = palette[(fp>>8)&0xff];
            d[6] = palette[(fp>>16)&0xff];
            d[7] = palette[fp>>24];
            asm("pref @%0" : : "r" (d));
            d += 8;
        }
    }
    {
        /* Wait for both store queues to complete */
        uint32 *d = (uint32 *)0xe0000000;
        d[0] = d[8] = 0;
    }
#endif
    //lock_time += end_timer ();
This uses the store queue to write 32 bytes per loop to the screen. The store queue is basically building a cache line with a set address in the cpu, then flushing that line in a burst write. At least, that's how I understand it.
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by RyoDC » Tue Jul 12, 2011 7:23 am

Wow, such a big answer! Thank you very much for the info!
How do I try to build a Dreamcast toolchain:
Image
Chilly Willy
DC Developer
DC Developer
Posts: 414
Joined: Thu Aug 20, 2009 11:00 am
Has liked: 0
Been liked: 2 times

Re: PVR YUV->UYVY Conversion

Post by Chilly Willy » Tue Jul 12, 2011 4:30 pm

RyoDC wrote:Wow, such a big answer! Thank you very much for the info!
You're welcome. All the info above can be found elsewhere on the forum and in example code, but it's nice to have a concise summary in places rather than having to hunt it all down. Particularly the store queue info. :grin:
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by RyoDC » Wed Jul 13, 2011 6:20 am

Code: Select all

    //start_timer ();
#if 0
    for (j=top; j<height; j++)
        for (i=left; i<width; i++)
            base_address[i + j*lineWidth] = palette[screens[0][i + j*SCREENWIDTH]];
My version of this func correct me if i wrong =)

Code: Select all

struct point{
short unsigned int x;
short unsigned int y;
};

void copyrgn(int* src, int* dst, point A, point B, point C)
{
    for(int i=0; i<B.x-A.x; i++)
      for(int j=0; j<B.y-A.y; j++)
          *(dst+(i)*height+C.x+j+C.y)=*(src+(i)*height+A.x+j+A.y);
}
How do I try to build a Dreamcast toolchain:
Image
Chilly Willy
DC Developer
DC Developer
Posts: 414
Joined: Thu Aug 20, 2009 11:00 am
Has liked: 0
Been liked: 2 times

Re: PVR YUV->UYVY Conversion

Post by Chilly Willy » Wed Jul 13, 2011 11:03 pm

If x is the x coord, you need j*width, not i*height.

dst[i+C.x+(j+C.y)*width] = src[i+A.x+(j+A.y)*width];

Turning the loops from j,i to i,j doesn't change the layout of data in ram. Do j as the outside loop and i as the inside look like in my example to take advantage of the layout in ram.
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by RyoDC » Thu Jul 14, 2011 7:45 am

I was test it in my Visual Studio yesterday and it was seriously glitching (stack corrupted), I think I corrupted the memory somewhere, but dunno how, I've tested my func on the paper with a test values and theoretically all must work ideal. Dunno what it can be :(
How do I try to build a Dreamcast toolchain:
Image
Chilly Willy
DC Developer
DC Developer
Posts: 414
Joined: Thu Aug 20, 2009 11:00 am
Has liked: 0
Been liked: 2 times

Re: PVR YUV->UYVY Conversion

Post by Chilly Willy » Fri Jul 15, 2011 3:42 am

For one - notice how I use dst[] and src[] rather than pointer arithmetic. Remember than pointer arithmetic automatically scales for the size of the value pointed to. If ptr is a uint32_t pointer, ptr += 1; does NOT add 1 to the value... it adds 4. Keep that in mind when doing things like *(dst + whatever)... whatever will NOT be added to dst, whatever * 4 will be. Notice when I do something similar, I use casting to avoid the issue. Something like *(uint32_t*)((uint32_t)dst + whatever) will add dst as a uint32_t with whatever, recast back as a uint32_t pointer. C can be a little tricky on this pointer stuff. 8-)
User avatar
RyoDC
Mental DCEmu
Mental DCEmu
Posts: 353
Joined: Wed Mar 30, 2011 12:13 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by RyoDC » Fri Jul 15, 2011 4:13 am

Oh, now I see.... Thank you Chilly... I'm a crap. I'll never create my own 2d game for Dreamcast... :'-(
How do I try to build a Dreamcast toolchain:
Image
Chilly Willy
DC Developer
DC Developer
Posts: 414
Joined: Thu Aug 20, 2009 11:00 am
Has liked: 0
Been liked: 2 times

Re: PVR YUV->UYVY Conversion

Post by Chilly Willy » Sat Jul 16, 2011 8:34 pm

RyoDC wrote:Oh, now I see.... Thank you Chilly... I'm a crap. I'll never create my own 2d game for Dreamcast... :'-(
Don't get too discouraged. This is called "experience" and will help you... later down the road. 8-)
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5410
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has liked: 2 times
Been liked: 16 times
Contact:

Re: PVR YUV->UYVY Conversion

Post by BlueCrab » Sat Jul 16, 2011 8:37 pm

Chilly Willy wrote:
RyoDC wrote:Oh, now I see.... Thank you Chilly... I'm a crap. I'll never create my own 2d game for Dreamcast... :'-(
Don't get too discouraged. This is called "experience" and will help you... later down the road. 8-)
As Chilly Willy implies there, all of us were in similar shoes at some point. :wink:
User avatar
SWAT
Insane DCEmu
Insane DCEmu
Posts: 191
Joined: Sat Jan 31, 2004 2:34 pm
Location: Russia/Novosibirsk
Has liked: 0
Been liked: 0
Contact:

Re: PVR YUV->UYVY Conversion

Post by SWAT » Fri Jul 29, 2011 4:16 am

I also tried to use the YUV converter, but faced the same problem.
I complement the information contained here.

You do not have enough interrupt:

Code: Select all

#define ASIC_EVT_PVR_YUV_DONE 0x0006

static semaphore_t *yuv_done;

static void asic_yuv_evt_handler(uint32 code) {
	sem_signal(yuv_done);
}

// initialization
yuv_done = sem_create(0);
asic_evt_set_handler(ASIC_EVT_PVR_YUV_DONE, asic_yuv_evt_handler);
asic_evt_enable(ASIC_EVT_PVR_YUV_DONE, ASIC_IRQ_DEFAULT);

// then at end of function pvr_yuv_transfer add sem_wait(yuv_done);
It is not necessary to modify the code of pvr_dma_transfer, you can do:

Code: Select all

// and block it (set -1 to the block argument), otherwise the texture begins to swim.
pvr_dma_transfer ((void *) udst, 0x10800000, 64, PVR_DMA_TA, -1, NULL, 0);
The line:

Code: Select all

printf("PVR: YUV Macroblocks Converted: %i\n", PVR_GET(PVR_YUV_STAT) );
will output is always 0. But if you move it into the while, you will see the progress.

Converter works well, but the texture is not good :(
Seems that the order of macroblocks is confused.
Attachments
My result of video decoding with ffmpeg
My result of video decoding with ffmpeg
video_artifacts.jpg (172.41 KiB) Viewed 4009 times
Image
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by PH3NOM » Wed Aug 03, 2011 2:55 pm

Hi SWAT!
Thanks for the input.

I still havn't got it fully working.
The thing to keep in mind, is that data must be sent per macroblock, not per row of pixels.
It is necessary to re-arrange the image data before or while sending to the PVR.

What is the status of FFMpeg on DC?
How did you get it to build successfully?
I got it to compile, but the example is crashing for me on DC.
User avatar
SWAT
Insane DCEmu
Insane DCEmu
Posts: 191
Joined: Sat Jan 31, 2004 2:34 pm
Location: Russia/Novosibirsk
Has liked: 0
Been liked: 0
Contact:

Re: PVR YUV->UYVY Conversion

Post by SWAT » Thu Aug 04, 2011 7:33 am

ffmpeg works partially, there is a problem with auto detect format, I think you have a problem in that.
And I had to replace it mpeg audio decoder, the one that was in the library did not work for DC.
Image
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 574
Joined: Fri Jun 18, 2010 9:29 pm
Has liked: 0
Been liked: 0

Re: PVR YUV->UYVY Conversion

Post by PH3NOM » Fri Aug 05, 2011 9:40 am

I thought you had asked about the order of macroblocks for the converter, here is my idea of the order:

Code: Select all

/* Multi-Dimensional Arrays */
void parse_array() {

    int width = 512, height=256, pixel = 0, xpos = 0, ypos = 0;
    int mblocks = ( (width/16) * (height/16) );
    int uData[width/2][height/2];
    int uBlock[mblocks][64];
    
    /* Transverse a block of U-Data, assigning pixel vaules */
    while( ypos < height/2) {
	    while( xpos < width/2 ) {
		    uData [xpos][ypos] = pixel;
		    xpos++;
		    pixel++;
	    }
  	    xpos=0;
	    ypos++;
    }
    
    pixel = 0; xpos=0; ypos=0;
    int xBpos = 0, yBpos = 0, block = 0;
    
    /* Transverse a block of U-Data, seperating into macroblocks */
    while( ypos < height/2) {
        while( xpos < width/2 ) {
	        while( yBpos < 8 ) {
                while( xBpos < 8 ) {
		            uBlock[block][pixel] = uData [xpos][ypos];
                    xBpos++;
		            xpos++;
		            pixel++;
                }
	        xpos-=8;
	        xBpos = 0;
            yBpos++;
	        ypos++;
      	    }
        pixel=0;
        block++;
        ypos-=8;
        xpos+=8;
        yBpos = 0;
        xBpos = 0;
        }
    xpos=0;
    ypos+=8;
    }    
  
    /* Print out the macroblock U-Data */
    pixel =0;
    int current_block = 0, total_pixels = 0;
    while( current_block < mblocks ) {
      while( pixel < 64) {
        printf( "Pixel: %i - Value: %i\n", total_pixels, uBlock[current_block][pixel] );
        total_pixels++;
        pixel++;
      }
      pixel = 0; 
      current_block ++;
    }

}
Post Reply