- Can DMA submission be faster than SQ submission or are they roughly the same?
- Is DMA async? Could I for example, start submitting vertex data, and then start building the lists for the next frame while that's submitting?
- Can someone give me a brief example of how I'd submit the OP, PT and TR lists of vertex data if I have them ready to go in an array?
Questions about DMA poly submission
-
- Insane DCEmu
- Posts: 145
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Tue May 02, 2017 3:11 pm
- Has thanked: 3 times
- Been thanked: 34 times
Questions about DMA poly submission
This is really just to get a basic understanding of how DMA submission works. Currently in GLdc I'm submitting vertices via store queues, but I'm considering experimenting with DMA instead, so I have a few questions before I even begin:
-
- Soul Sold for DCEmu
- Posts: 4865
- Joined: Fri Jul 11, 2003 9:56 pm
- Has thanked: 2 times
- Been thanked: 4 times
Re: Questions about DMA poly submission
I do remember this bit of info
Rand Linden wrote: ↑Thu Dec 29, 2005 2:30 pm Agreed -- generally speaking, using the SQ is the fastest method of transferring data.
If you use DMA, you'll have to write the data to ram at some point (which requires flushing) -- and if you're doing the write anyway, might as well just do it to the SQ and be done with it.
That'll also save all the cache thrashing, which can be a *huge* nightmare all its own.
Rand.
Dreamcast forever!!!
-
- DC Developer
- Posts: 104
- Joined: Sun Oct 04, 2009 11:13 am
- Has thanked: 2 times
- Been thanked: 88 times
Re: Questions about DMA poly submission
I haven't yet tried using DMA to submit data to the PVR, but I think it can be faster than store queues in certain situations.Can DMA submission be faster than SQ submission or are they roughly the same?
I've noticed that submitting large polygons by SQs can slow down the CPU. I think this happens because it takes the tile accelerator has to write many list pointers for large polygons and this keeps the TA busy for a long time, but if CPU keeps feeding the TA more data before it's done with the current strip, the TA signals the SQs (and CPU) to stall. It's possible that by using DMA instead, instead of the CPU stalling, the DMA controller would stall, and the CPU could keep running. So DMA can save CPU time spent submitting data (but it won't speed up the PVR itself).
I don't know of a hardware reason that it's not possible. It looks like you could do it by making two buffers and calling pvr_dma_load_ta (with the "block" parameter set to 0) and alternating between them.Is DMA async? Could I for example, start submitting vertex data, and then start building the lists for the next frame while that's submitting?
You don't need to flush the cache if you don't use it. You can use store queues to write directly to RAM. The SH4 also has write buffers to allow fast writes to uncached RAM; I don't know if they're better than store queues, though.If you use DMA, you'll have to write the data to ram at some point (which requires flushing) -- and if you're doing the write anyway, might as well just do it to the SQ and be done with it.