pl_mpegDC ported running but community help needed

Ian Robinson · Post by **Ian Robinson** » Thu Aug 10, 2023 12:47 pm

Name: PL_MPEGDC
Copyright: 7/31/20
Author: Ian micheal + Magnes(Bertholet)
Date: 31/07/23 09:03
Description: Dreamcast preliminary port KallistiOS video PVR without sound
All patents related to MPEG1 and MP2 have expired, so it's completely free now.

We need a proper mpeg api for indie games with a complete free license, so I started porting this. We have it running, but very slowly, of course.
If you're converting RGB on the CPU, I had the idea of using the PVR Yuv422 format and having it work, but of course color conversion is
A problem here is the github
https://github.com/ianmicheal/pl_mpegDC

With more info and the bleeding crazy versions are in that folder
https://github.com/ianmicheal/pl_mpegDC ... /%E2%80%9C
Working very slow https://streamable.com/vand9u
Working much faster 100ms a frame faster wrong colour https://streamable.com/mzxeyg

Any idea's and help to speed it up this full Single-file MIT licensed library for C/C++

All i can say is please help

Twada · Post by **Twada** » Fri Aug 11, 2023 3:30 am

I am glad that you are working on this issue!

I was also working on speeding up pl_mpeg.
Here is my half-finished code.
It's video only, but it can play 320x240 videos at over 100% speed. (This sample is 368x208.)

pl_mpeg5.rar: (7.12 MiB) Downloaded 70 times

I worked as follows.

1. I am calling plm_decode_video() every frame without using plm_decode().
TODO: You should rework your high-level functions to have reasonable wait times.

2. Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c)

3. Replaced simple and beautiful code with conditionals.
(lines 2430-2936 of pl_mpeg.c) Significant speedup by referencing Berkeley's MPEG player.

I was unable to:
1. Proper speed control.
2. Asynchronous processing using threads.
3. Sound.

I needed help too. The Dreamcast community will be delighted when the MPEG player is complete!

Ian Robinson · Post by **Ian Robinson** » Fri Aug 11, 2023 4:13 am

Twada wrote: ↑Fri Aug 11, 2023 3:30 am I am glad that you are working on this issue!

I was also working on speeding up pl_mpeg.
Here is my half-finished code.
It's video only, but it can play 320x240 videos at over 100% speed. (This sample is 368x208.)
pl_mpeg5.rar

I worked as follows.

1. I am calling plm_decode_video() every frame without using plm_decode().
TODO: You should rework your high-level functions to have reasonable wait times.

2. Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c)

3. Replaced simple and beautiful code with conditionals.
(lines 2430-2936 of pl_mpeg.c) Significant speedup by referencing Berkeley's MPEG player.

I was unable to:
1. Proper speed control.
2. Asynchronous processing using threads.
3. Sound.

I needed help too. The Dreamcast community will be delighted when the MPEG player is complete!

wonderful work i hope we just combine any together to get this done

reading your changes very smart always your code is so nice and well done.. lot of work you have done thanks for sharing this runs great i just built it from src.. Major thing as well with the conversion RGB on the cpu is just super slow 170+ms a frame i wanted to do what your doing only got to green and pink colours on display.

lerabot · Post by **lerabot** » Fri Aug 11, 2023 10:47 am

Thank you so much to both of you!
This is great work.

I would be amazing if we can get closer to 640 X 480 but I don't know if this is possible.

Ian Robinson · Post by **Ian Robinson** » Fri Aug 11, 2023 12:01 pm

Twada wrote: ↑Fri Aug 11, 2023 3:30 am I am glad that you are working on this issue!

I was also working on speeding up pl_mpeg.
Here is my half-finished code.
It's video only, but it can play 320x240 videos at over 100% speed. (This sample is 368x208.)
pl_mpeg5.rar

I worked as follows.

1. I am calling plm_decode_video() every frame without using plm_decode().
TODO: You should rework your high-level functions to have reasonable wait times.

2. Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c)

3. Replaced simple and beautiful code with conditionals.
(lines 2430-2936 of pl_mpeg.c) Significant speedup by referencing Berkeley's MPEG player.

I was unable to:
1. Proper speed control.
2. Asynchronous processing using threads.
3. Sound.

I needed help too. The Dreamcast community will be delighted when the MPEG player is complete!

If you could explain how convert multi-planar YUV 4:4:4 to packed YUV 4:2:2? I knew it was possible but just could not get it done and how this works

Code: Select all

void app_on_video(plm_t *mpeg, plm_frame_t *frame, void *user)
{
    unsigned int *dest = (unsigned int *)disp_tex;
    unsigned int *src = (unsigned int *)frame->display;

    volatile unsigned int *d = (volatile unsigned int *)0xa05f8148;
    volatile unsigned int *cfg = (volatile unsigned int *)0xa05f814c;

    volatile unsigned int *stride_reg = (volatile unsigned int *)0xa05f80e4;
    int stride_value;
    int stride = 0;

    int x, y, w, h, i;

    if (!frame)
        return;

    /* set frame size. */
    w = frame->width >> 4;
    h = frame->height >> 4;
    stride_value = (w >> 1); /* 16 pixel / 2 */

    /* Set Stride value. */
    *stride_reg &= 0xffffffe0;
    *stride_reg |= stride_value & 0x01f;

    /* Set SQ to YUV converter. */
    *d = ((unsigned int)dest) & 0xffffff;
    *cfg = 0x00000f1f;
    x = *cfg; /* read on once */

    // QACR0 = ((((unsigned int)0x10800000) >> 26) << 2) & 0x1c;
    // QACR1 = ((((unsigned int)0x10800000) >> 26) << 2) & 0x1c;

    for (y = 0; y < h; y++)
    {
        for (x = 0; x < w; x++, src += 96)
        {
            sq_cpy((void *)0x10800000, (void *)src, 384);
        }
        if (!stride)
        {
            /* Send dummy mb */
            for (i = 0; i < 32 - w; i++)
            {
                sq_set((void *)0x10800000, 0, 384);
            }
        }
    }
    for (i = 0; i < 16 - h; i++)
    {
        if (!stride)
            sq_set((void *)0x10800000, 0, 384 * 32);
        else
            sq_set((void *)0x10800000, 0, 384 * w);
    }
}

? struggling understanding how it's possible i know it was had done something like this before .. I see your saying Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c) But i dont get how it works tbh.

Twada · Post by **Twada** » Fri Aug 11, 2023 4:50 pm

Ian Robinson wrote: ↑Fri Aug 11, 2023 12:01 pm ? struggling understanding how it's possible i know it was had done something like this before .. I see your saying Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c) But i dont get how it works tbh.

Ah, I didn't fix that I hardcoded. I am embarrassed.
The YUV converter has two registers.

Code: Select all

    volatile unsigned int *d = (volatile unsigned int *)0xa05f8148;
    volatile unsigned int *cfg = (volatile unsigned int *)0xa05f814c;

    /* Set SQ to YUV converter. */
    *d = ((unsigned int)dest) & 0xffffff;
    *cfg = 0x00000f1f;
    x = *cfg; /* read on once */

For KOS it is PVR_YUV_ADDR (0x0148) and PVR_YUV_CFG_1 (0x014c).

First, set the output destination VRAM address in PVR_YUV_ADDR.

Code: Select all

PVR_SET(PVR_YUV_ADDR, (((unsigned int)dest) & 0xffffff))

Next, set the size and format of the PVR_YUV_CFG_1 data.

Code: Select all

PVR_SET(PVR_YUV_CFG_1, ((height << 8) | width))

height and width are the size of the output texture.
You can choose from 32, 64, 128, 256, 512, 1024, divide by 16 and subtract 1.
The actual values to set are 1, 3, 7, 15, 31, 63.
The sample sets the height to 256 and the width to 512, so the value is 0x00000f1f.
Width can be specified in more detail when using stride textures.
For example, 320 is 320/16-1=19.
I am not using stride textures this time, so I am transferring dummy blocks.

Code: Select all

            /* Send dummy mb */
            for (i = 0; i < 32 - w; i++)
            {
                sq_set((void *)0x10800000, 0, 384);
            }

After setting the two registers, transfer the data to the YUV converter (0x10800000).
The data is block by block. U data is 64 bytes, V data is 64 bytes, and Y data is 256 bytes, which is 384 bytes each. Do not change the order.

I'm modding pl_mpeg to create this 384 byte format at the decoding stage. Since the data is contiguous, it speeds up quite a bit.
However, I-frames and P-frames require full-screen data, as P-frames and B-frames must refer to the screen.
So only then are we creating data for reference. I want to make it smarter.

Ian Robinson · Post by **Ian Robinson** » Fri Aug 11, 2023 10:22 pm

Twada wrote: ↑Fri Aug 11, 2023 4:50 pm
Ian Robinson wrote: ↑Fri Aug 11, 2023 12:01 pm ? struggling understanding how it's possible i know it was had done something like this before .. I see your saying Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c) But i dont get how it works tbh.

Ah, I didn't fix that I hardcoded. I am embarrassed.
The YUV converter has two registers.
Code: Select all
    volatile unsigned int *d = (volatile unsigned int *)0xa05f8148;
    volatile unsigned int *cfg = (volatile unsigned int *)0xa05f814c;

    /* Set SQ to YUV converter. */
    *d = ((unsigned int)dest) & 0xffffff;
    *cfg = 0x00000f1f;
    x = *cfg; /* read on once */
For KOS it is PVR_YUV_ADDR (0x0148) and PVR_YUV_CFG_1 (0x014c).

First, set the output destination VRAM address in PVR_YUV_ADDR.
Code: Select all
PVR_SET(PVR_YUV_ADDR, (((unsigned int)dest) & 0xffffff))
Next, set the size and format of the PVR_YUV_CFG_1 data.
Code: Select all
PVR_SET(PVR_YUV_CFG_1, ((height << 8) | width))
height and width are the size of the output texture.
You can choose from 32, 64, 128, 256, 512, 1024, divide by 16 and subtract 1.
The actual values to set are 1, 3, 7, 15, 31, 63.
The sample sets the height to 256 and the width to 512, so the value is 0x00000f1f.
Width can be specified in more detail when using stride textures.
For example, 320 is 320/16-1=19.
I am not using stride textures this time, so I am transferring dummy blocks.
Code: Select all
            /* Send dummy mb */
            for (i = 0; i < 32 - w; i++)
            {
                sq_set((void *)0x10800000, 0, 384);
            }
After setting the two registers, transfer the data to the YUV converter (0x10800000).
The data is block by block. U data is 64 bytes, V data is 64 bytes, and Y data is 256 bytes, which is 384 bytes each. Do not change the order.

I'm modding pl_mpeg to create this 384 byte format at the decoding stage. Since the data is contiguous, it speeds up quite a bit.
However, I-frames and P-frames require full-screen data, as P-frames and B-frames must refer to the screen.
So only then are we creating data for reference. I want to make it smarter.

NO need to be embarrassed it works and well

thanks for the info very interesting solution to me.

Post by **|darc|** » Sat Aug 12, 2023 10:49 am

Moving to Programming Discussion as this has quickly shifted from an idea to reality!
Thanks everyone!

Ian Robinson · Post by **Ian Robinson** » Tue Aug 22, 2023 7:48 am

Twada wrote: ↑Fri Aug 11, 2023 4:50 pm
Ian Robinson wrote: ↑Fri Aug 11, 2023 12:01 pm ? struggling understanding how it's possible i know it was had done something like this before .. I see your saying Utilizes the Dreamcast YUV converter.
That's why we're decoding the video block by block. A reference frame is created only for I-frames and P-frames. (lines 2156-2241 of pl_mpeg.c) But i dont get how it works tbh.

Ah, I didn't fix that I hardcoded. I am embarrassed.
The YUV converter has two registers.
Code: Select all
    volatile unsigned int *d = (volatile unsigned int *)0xa05f8148;
    volatile unsigned int *cfg = (volatile unsigned int *)0xa05f814c;

    /* Set SQ to YUV converter. */
    *d = ((unsigned int)dest) & 0xffffff;
    *cfg = 0x00000f1f;
    x = *cfg; /* read on once */
For KOS it is PVR_YUV_ADDR (0x0148) and PVR_YUV_CFG_1 (0x014c).

First, set the output destination VRAM address in PVR_YUV_ADDR.
Code: Select all
PVR_SET(PVR_YUV_ADDR, (((unsigned int)dest) & 0xffffff))
Next, set the size and format of the PVR_YUV_CFG_1 data.
Code: Select all
PVR_SET(PVR_YUV_CFG_1, ((height << 8) | width))
height and width are the size of the output texture.
You can choose from 32, 64, 128, 256, 512, 1024, divide by 16 and subtract 1.
The actual values to set are 1, 3, 7, 15, 31, 63.
The sample sets the height to 256 and the width to 512, so the value is 0x00000f1f.
Width can be specified in more detail when using stride textures.
For example, 320 is 320/16-1=19.
I am not using stride textures this time, so I am transferring dummy blocks.
Code: Select all
            /* Send dummy mb */
            for (i = 0; i < 32 - w; i++)
            {
                sq_set((void *)0x10800000, 0, 384);
            }
After setting the two registers, transfer the data to the YUV converter (0x10800000).
The data is block by block. U data is 64 bytes, V data is 64 bytes, and Y data is 256 bytes, which is 384 bytes each. Do not change the order.

I'm modding pl_mpeg to create this 384 byte format at the decoding stage. Since the data is contiguous, it speeds up quite a bit.
However, I-frames and P-frames require full-screen data, as P-frames and B-frames must refer to the screen.
So only then are we creating data for reference. I want to make it smarter.

I have been also working on dreamroq up ported to kos2.0 fixed threading and other things https://github.com/ianmicheal/DREAMROQ-WORKING-SOUND- now works but sound lags a bit i do wonder if we could use your idea of the YUV converter on it as well..
https://github.com/ianmicheal/DREAMROQ- ... lib.c#L103 let me know if you think that's possible might be able to use the sound part lib dcmc for this mpeg version..

Twada · Post by **Twada** » Fri Aug 25, 2023 5:29 pm

Ian Robinson wrote: ↑Tue Aug 22, 2023 7:48 am I have been also working on dreamroq up ported to kos2.0 fixed threading and other things https://github.com/ianmicheal/DREAMROQ-WORKING-SOUND- now works but sound lags a bit i do wonder if we could use your idea of the YUV converter on it as well..
https://github.com/ianmicheal/DREAMROQ- ... lib.c#L103 let me know if you think that's possible might be able to use the sound part lib dcmc for this mpeg version..

The ROQ format is also very interesting. Thank you for your work!
I'm not sure about the ROQ format, is it similar to VQ textures?
If so, you might be able to use the YUV texture format as well.

I was able to get the sound working thanks to dreamcast.wiki!
However, I found plm_decode_audio() to be ridiculously slow. It's twice as slow as the accelerated plm_decode_video().
It looks like I'll have to work on speeding up the sound from now on.
I might try the dcmc library after that...

Ian Robinson · Post by **Ian Robinson** » Sat Aug 26, 2023 5:28 am

Twada wrote: ↑Fri Aug 25, 2023 5:29 pm
Ian Robinson wrote: ↑Tue Aug 22, 2023 7:48 am I have been also working on dreamroq up ported to kos2.0 fixed threading and other things https://github.com/ianmicheal/DREAMROQ-WORKING-SOUND- now works but sound lags a bit i do wonder if we could use your idea of the YUV converter on it as well..
https://github.com/ianmicheal/DREAMROQ- ... lib.c#L103 let me know if you think that's possible might be able to use the sound part lib dcmc for this mpeg version..
The ROQ format is also very interesting. Thank you for your work!
I'm not sure about the ROQ format, is it similar to VQ textures?
If so, you might be able to use the YUV texture format as well.

I was able to get the sound working thanks to dreamcast.wiki!
However, I found plm_decode_audio() to be ridiculously slow. It's twice as slow as the accelerated plm_decode_video().
It looks like I'll have to work on speeding up the sound from now on.
I might try the dcmc library after that...

Not only that the acia is very slow you can see this with TapamN's post benchmark https://dcemulation.org/phpBB/viewtopic ... 8#p1058848

BB Hood · Post by **BB Hood** » Sat Sep 02, 2023 10:46 pm

First off, awesome stuff!! I gonna look into maybe creating an example that shows utilizing the TA to convert YUV420 => YUV422 once I fully get the grasp of it. After playing with your code a bit, I rewrote some video related functions in your main.c file. I cut some stuff that doesn't seem to make a difference, moved other things so they are only done once, replaced your hard coded stuff and magic numbers with PVR_*equivalents and #defines. There is one thing Im stomped on. What is the 96 coming from in for (x = 0; x < w; x++, src += 96). Do you use GitHub or GitLab at all? Would be a lot easier to share code back and forth. Oh and join the Discord :^p https://discord.gg/NjwBRKbk

Code: Select all

/* png example for KOS 1.1.x
 * Jeffrey McBeth / Morphogenesis
 * <mcbeth@morphogenesis.2y.net>
 *
 * Heavily borrowed from from 2-D example
 * AndrewK / Napalm 2001
 * <andrewk@napalm-x.com>
 */

#include <kos.h>
#include "perfctr.h"

// #define PL_MPEG_IMPLEMENTATION
#include "pl_mpeg.h"
#define min(a, b) (((a) < (b)) ? (a) : (b))

plm_t *plm;

/* textures */
pvr_ptr_t disp_tex;

snd_stream_hnd_t snd_hnd;
__attribute__((aligned(32))) unsigned char snd_buf[65536 + 16384];

// Output texture width and height initial values
// You can choose from 32, 64, 128, 256, 512, 1024
#define PVR_TEXTURE_WIDTH 512
#define PVR_TEXTURE_HEIGHT 256

pvr_poly_hdr_t hdr;
pvr_vertex_t vert[4];

void setup_graphics()
{
    pvr_poly_cxt_t cxt;

    pvr_poly_cxt_txr(&cxt, PVR_LIST_OP_POLY, PVR_TXRFMT_YUV422 | PVR_TXRFMT_NONTWIDDLED, PVR_TEXTURE_WIDTH, PVR_TEXTURE_HEIGHT, disp_tex, PVR_FILTER_BILINEAR);
    pvr_poly_compile(&hdr, &cxt);

    hdr.mode3 |= PVR_TXRFMT_STRIDE; // Was 0x02000000; which had one too many zeros. Should be 0x0200000

    vert[0].z     = vert[1].z     = vert[2].z     = vert[3].z     = 1.0f; 
    vert[0].argb  = vert[1].argb  = vert[2].argb  = vert[3].argb  = PVR_PACK_COLOR(1.0f, 1.0f, 1.0f, 1.0f);    
    vert[0].oargb = vert[1].oargb = vert[2].oargb = vert[3].oargb = 0;  
    vert[0].flags = vert[1].flags = vert[2].flags = PVR_CMD_VERTEX;         
    vert[3].flags = PVR_CMD_VERTEX_EOL;

    vert[0].x = 1;
    vert[0].y = 1;
    vert[0].u = 0;
    vert[0].v = 0;

    vert[1].x = 640;
    vert[1].y = 1;
    vert[1].u = 0.71875;
    vert[1].v = 0.0;

    vert[2].x = 1;
    vert[2].y = 480;
    vert[2].u = 0;
    vert[2].v = 0.8125;

    vert[3].x = 640;
    vert[3].y = 480;
    vert[3].u = 0.71875;
    vert[3].v = 0.8125;

    // Point to the dest texture in the PVR
    unsigned int *dest = (unsigned int *)disp_tex;

    /* Set SQ to YUV converter. */
    PVR_SET(PVR_YUV_ADDR, (((unsigned int)dest) & 0xffffff));
    // Divide texture width and texture height by 16 and subtract 1.
    // The actual values to set are 1, 3, 7, 15, 31, 63.
    PVR_SET(PVR_YUV_CFG_1, (((PVR_TEXTURE_HEIGHT / 16) - 1) << 8) | ((PVR_TEXTURE_WIDTH / 16) - 1));
    PVR_GET(PVR_YUV_CFG_1);
}

void app_on_video(plm_t *mpeg, plm_frame_t *frame, void *user)
{
    unsigned int *src = (unsigned int *)frame->display;

    int x, y, w, h;
    int stride = 0;

    if (!frame)
        return;

    /* Set Stride value. */
    // This can be moved outside this function too and only needs to be executed once.
    //https://multimedia.cx/eggs/roq-on-dreamcast/#comment-167893
    //PVR_SET(PVR_TEXTURE_MODULO, 640/32); // -1 not needed ???
    //So if you want a 640*480 texture, you need to set the standard power-of-two-size to be a larger than the real size 
    //(so for a 640*480 real size, you set it to 1024*512) then set the stride bit (PVR_TXRFMT_STRIDE) on the header before you submit it.
    if(stride)
        PVR_SET(PVR_TEXTURE_MODULO, frame->width/32); // -1 needed ??? Not according to your *stride_reg |= stride_value & 0x01f;

    /* set frame size. */
    w = frame->width >> 4;
    h = frame->height >> 4;

    if (!stride) {
        for (y = 0; y < h; y++) {
            for (x = 0; x < w; x++, src += 96) {  // += 384/4(size of int) = 96
                sq_cpy((void *)0x10800000, (void *)src, 384);
            }
            // Send dummy mb
            sq_set((void *)0x10800000, 0, 384 * (32 - w));
        }
    } else {
        for (y = 0; y < h; y++) {
            for (x = 0; x < w; x++, src += 96) {  // += 384/4(size of int) = 96
                sq_cpy((void *)0x10800000, (void *)src, 384);
            }
        }
    }
}

void app_on_audio(plm_t *mpeg, plm_samples_t *samples, void *user)
{
    int size = sizeof(float) * samples->count * 2;
    // SDL_QueueAudio(self->audio_device, samples->interleaved, size);
    // snd_sh4_to_aica_stop();
    // snd_sh4_to_aica((void *)samples->interleaved, 200);
    // snd_sh4_to_aica_start();
}

void *sound_callback(snd_stream_hnd_t hnd, int size, int *size_out)
{
    plm_samples_t *sample = plm_decode_audio(plm);

    // if (sample == NULL)
    // {
    //     return NULL;
    // }

    // if(size > (PLM_AUDIO_SAMPLES_PER_FRAME * 2))
    // {
    //     size = (PLM_AUDIO_SAMPLES_PER_FRAME * 2);
    // }

    *size_out = size;

    //printf("%d::%d ", size, *size_out);

    return (void *)sample->interleaved;
}

/* romdisk */
extern uint8 romdisk_boot[];
KOS_INIT_ROMDISK(romdisk_boot);

int main(void)
{
    int done = 0;
    double elapsed_time = 0.0;
    double current_time = 0.0;
    double last_time = 0.0;

    PMCR_Init(1, PMCR_ELAPSED_TIME_MODE, 2);

    /* init kos  */
    pvr_init_defaults();

    disp_tex = pvr_mem_malloc(PVR_TEXTURE_WIDTH * PVR_TEXTURE_HEIGHT * 2);
    setup_graphics();

    plm = plm_create_with_filename("/rd/sample.mpg");

    if (plm == 0)
        return 0;

    // plm_set_video_decode_callback(plm, app_on_video, 0);
    // plm_set_audio_decode_callback(plm, app_on_audio, 0);
    // plm_set_loop(plm, TRUE);
    plm_set_audio_enabled(plm, 1);

    last_time = (double)timer_ms_gettime64() / 1000.0;

    // snd_stream_init();
    // snd_hnd = snd_stream_alloc(sound_callback, PLM_AUDIO_SAMPLES_PER_FRAME << 3);
    // snd_stream_reinit(snd_hnd, sound_callback);
    // snd_stream_volume(snd_hnd, 0xff);
    // snd_stream_queue_enable(snd_hnd);
    // snd_stream_start(snd_hnd, 44100, 1);
    // snd_stream_queue_go(snd_hnd);

    /* keep drawing frames until start is pressed */
    while (!done)
    {
        MAPLE_FOREACH_BEGIN(MAPLE_FUNC_CONTROLLER, cont_state_t, st)

        if (st->buttons & CONT_START)
            done = 1;

        MAPLE_FOREACH_END()

        pvr_wait_ready();

        // plm_decode(plm, elapsed_time);
        // if (plm_has_ended(plm))
        // {
        //     plm_destroy(plm);
        //     break;
        // }
        // snd_sh4_to_aica_start();

        plm_frame_t *frame = plm_decode_video(plm);
        // plm_samples_t *sample = plm_decode_audio(plm);
        if (!frame)
        {
            break;
        }
        app_on_video(plm, frame, 0);
        // plm_decode_audio(plm);

        // Decode
        current_time = (double)timer_ms_gettime64() / 1000.0;
        elapsed_time = min(current_time - last_time, 1.0 / 30.0);
        last_time = current_time;

        pvr_scene_begin();
        pvr_list_begin(PVR_LIST_OP_POLY);
        pvr_prim(&hdr, sizeof(hdr));
        pvr_prim(&vert[0], sizeof(pvr_vertex_t));
        pvr_prim(&vert[1], sizeof(pvr_vertex_t));
        pvr_prim(&vert[2], sizeof(pvr_vertex_t));
        pvr_prim(&vert[3], sizeof(pvr_vertex_t));
        pvr_list_finish();
        pvr_scene_finish();
    }

    pvr_mem_free(disp_tex);

    // snd_mem_free();
    // snd_mem_shutdown();
    // snd_shutdown();

    return 0;
}

Twada · Post by **Twada** » Sat Sep 02, 2023 11:37 pm

BB Hood wrote: ↑Sat Sep 02, 2023 10:46 pm First off, awesome stuff!! I gonna look into maybe creating an example that shows utilizing the TA to convert YUV420 => YUV422 once I fully get the grasp of it. After playing with your code a bit, I rewrote some video related functions in your main.c file. I cut some stuff that doesn't seem to make a difference, moved other things so they are only done once, replaced your hard coded stuff and magic numbers with PVR_*equivalents and #defines. There is one thing Im stomped on. What is the 96 coming from in for (x = 0; x < w; x++, src += 96). Do you use GitHub or GitLab at all? Would be a lot easier to share code back and forth. Oh and join the Discord :^p https://discord.gg/NjwBRKbk

Thank you for considering creating a sample!

YUV data must be in 16x16 blocks. The arrangement is horizontal.
In the case of YUV420, U data and V data are 64 bytes, Y data is 256 bytes, for a total of 384 bytes. Please do not change this order.
Since I am using a uint32 pointer, divide by 4 to get 96.

Sorry, I don't use Git or Discord at the moment. It seems convenient but seems difficult...

BB Hood · Post by **BB Hood** » Sat Sep 02, 2023 11:57 pm

Thanks!! I just found this post (https://dcemulation.org/phpBB/viewtopic ... 7#p1027297) by Phenom and it shows a way for DMA. Currently KOS is missing some of the functionality so I plan to add it soon.

BB Hood · Post by **BB Hood** » Sun Sep 03, 2023 8:08 pm

Twada, Im almost done with the sample but Im not getting something right. This is what I get after the conversion on a YUV420 You think you can help? I think its the way im generating the buffers to send to the TA for conversion

Code: Select all

static void convert() {
    int i, j, index, x_blk, y_blk;

    unsigned char u_block[64] __attribute__((aligned(32)));
    unsigned char v_block[64] __attribute__((aligned(32)));
    unsigned char y_block[256] __attribute__((aligned(32)));

    for (y_blk = 0; y_blk < PVR_TEXTURE_HEIGHT; y_blk += 16) {
        for (x_blk = 0; x_blk < PVR_TEXTURE_WIDTH; x_blk += 16) {
            
            // Extract U
            for (j = 0; j < 8; ++j) {
                for (i = 0; i < 8; ++i) {
                    index = (y_blk / 2 + j) * (PVR_TEXTURE_WIDTH / 2) + (x_blk / 2 + i);
                    u_block[j * 8 + i] = u_plane[index];
                }
            }

            // Extract V
            for (j = 0; j < 8; ++j) {
                for (i = 0; i < 8; ++i) {
                    index = (y_blk / 2 + j) * (PVR_TEXTURE_WIDTH / 2) + (x_blk / 2 + i);
                    v_block[j * 8 + i] = v_plane[index];
                }
            }

            // Extract Y
            for (j = 0; j < 16; ++j) {
                for (i = 0; i < 16; ++i) {
                    index = (y_blk + j) * PVR_TEXTURE_WIDTH + (x_blk + i);
                    y_block[j * 16 + i] = y_plane[index];
                }
            }

            sq_cpy((void *)PVR_TA_YUV_CONV, (void *)u_block, 64);
            sq_cpy((void *)PVR_TA_YUV_CONV, (void *)v_block, 64);
            sq_cpy((void *)PVR_TA_YUV_CONV, (void *)y_block, 256);
        }

        // Send dummy mb
        //sq_set((void *)PVR_TA_YUV_CONV, 0, 384 * (32 - (PVR_TEXTURE_WIDTH >> 4)));
    }
}

yuv.zip: (910.97 KiB) Downloaded 66 times

Twada · Post by **Twada** » Sun Sep 03, 2023 10:37 pm

ah. I didn't tell.
The Y data is 16x16, 256 bytes, and must follow the macroblock format.
In other words, it must be in the shape of four 8x8 blocks in a row. The orientation is horizontal.

I created a variable k and rewrote it as follows.

Code: Select all

            for (k = 0; k < 4; ++k)
            {
                for (j = 0; j < 8; ++j)
                {
                    for (i = 0; i < 8; ++i)
                    {
                        index = (y_blk + j + (k / 2 * 8)) * PVR_TEXTURE_WIDTH + (x_blk + i) + (k % 2 * 8);
                        y_block[k * 64 + j * 8 + i] = y_plane[index];           
                    }
                }
            }

Ian Robinson wrote: ↑Sat Aug 26, 2023 5:28 am Not only that the acia is very slow you can see this with TapamN's post benchmark https://dcemulation.org/phpBB/viewtopic ... 8#p1058848

And decoding mp2 audio is not faster at all. Decoding itself is slow.
I'd like to devise a data arrangement by referring to libmp3, but I'm at my limit. (lines 3759-3781 of pl_mpeg.c)
A hard-coded version that just makes sounds. i need help…

pl_mpeg6.rar: (7.13 MiB) Downloaded 66 times

BB Hood · Post by **BB Hood** » Mon Sep 04, 2023 12:15 am

<333333 Thank you! That fixed it alright. Im gonna take a look at your code and see what I can do.

BB Hood · Post by **BB Hood** » Mon Sep 04, 2023 1:17 am

It plays pretty well from running it on my Dreamcast. Its late here but I took at quick look at your sound code in main.c. I think I can help and will share code with you tomorrow. Unfortunately I have no input on the decoding of the sound data itself. Sorry.

BB Hood · Post by **BB Hood** » Mon Sep 04, 2023 10:38 pm

Twada, sorry. Its taking longer than I thought. The code I did write removed stress from the video but the sound came out aweful. Ultimately what you want to do is run an audio thread that calls snd_stream_poll instead of calling it every frame:

Code: Select all

void* snd_thread() {
    while(audio_status != AUDIO_STATUS_DONE) {
        snd_stream_poll(snd_hnd);
        thd_sleep(20);
    }

    return NULL;
}

and call your decode function every frame

Code: Select all

plm_samples_t *sample = plm_decode_audio(plm);

and store those results in a ring buffer. That in the

Code: Select all

sound_callback(snd_stream_hnd_t hnd, int size, int *size_out)

you can just read from. You will need mutex to surround the ring buffer so the audio thread and the main thread wont touch it at the same time.

I did the above for your code but I guess my implementation was bad. I copied what I did in my dreamroq repo and its just not working out. I will need to dig deeper for better implementation. Again, my apologies for not finding a working solution for you.

Twada · Post by **Twada** » Tue Sep 05, 2023 7:52 am

Thank you for listening to my unreasonable request.
I'm not familiar with stream-related code, so this is a very valuable hint!

Again, we need to speed up audio decoding. I'll try again after I cool my head.

pl_mpegDC ported running but community help needed

pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed

Re: pl_mpegDC ported running but community help needed