Lets Pimp out a 3rdMix and Push the PVR!!!!

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Lets Pimp out a 3rdMix and Push the PVR!!!!

Post by GyroVorbis »

So I'm a graphics guy who has done next to zero graphics on DC since rejoining the scene... I was really trying to focus on low-level OS, language, driver, and toolchain kinds of tasks initially, since I'm a bottom-up learner, plus I wanted to do those tasks for myself to learn those areas of computing where I'm less familiar...

Anyway, I've since gotten back into rendering and have made myself more familiar with the PVR API, finally. Anyway, one of the things that has always been on my mind is how 2ndMix needs a new rendition, because it's one of the main examples we test with while developing KOS to look for regressions... and tbh, it's not really using the hardware very well, as it was a very early tech demo. Actually, we got it to dip from 60fps to 30fps with a change the other day when built with -O2 when a bunch of text appears onscreen... So basically that little thing was *barely* hitting 60fps all this time...

So anyway, I dove into the codebase, and started working on it. In the end, I went from <100k polys/sec to more than 1.2 million polys/sec and only had to stop because I was running out of VRAM for storing the vertices, without texture compressing the font... Here's what the end result wound up looking like.

Here's a still screenshot (looks way better in motion):
Screenshot 2024-01-30 15-55-13.png
Here's a video on Twitter/X:
https://x.com/falco_girgis/status/17510 ... 58590?s=20

What did I do?
Falco Girgis wrote: 1) Changed stars from being triangles to being PVR HW sprites
2) Changed text from being triangle-strips to being PVR HW sprites
3) Redid how stars were stored. Rather than putting X, Y, Z coords in 3 separate arrays, packed them into a struct, using only int16_t for each field. Much easier on the cache.
4) Redid all star update math, moving it from being done as integer math to FP math... which made quite a hilarious amount of difference!
5) Cached the HW sprite strip header for text and only resubmitted it when the color was changed for the fade effect.
6) Moved all rendering away from pvr_prim() and towards using the direct rendering API with pvr_dr_commit() for each vertex. Unfortunately I had to do terrible hacks around the KOS PVR API to do this, since the direct rendering API only supports regular 32-bit polygon vertex types, not the 64-bit sprite vertex format that I needed here... Fortunately I wound up being able to do disgusting things to the APIand aliasing pointers to work around it... Will have to think about how to fix this in KOS soon, because there was a pretty significant performance increase using direct rendering.
7) Figured out a much more clever way to handle the coloring on the stars, which was a big one... After moving them to HW sprites, I would still have to resubmit a new sprite header each time the color had to change between sprites. At first I developed an intelligent batching system, which used C's qsort() to sort the stars by Z coord (which is what determined color), to reduce the number of mid-render state changes...

That gave an "okay" amount of performance gain, but what really did it was completely doing away with calculating color per-star in software... How? I realized that exact fade-in effect could be done using the hardware fog effect. Once I figured out how to recreate it (took forever to tweak the fog parameters), I wound up only ever needing to submit a single header for all stars within the scene, since color never changes. That's one header for 7k+ sprites! THAT made the biggest difference in polygon throughput.
8 ) Changed the static, 256-entry integer-based LUT for sin() and cos() to use the actual HW instructions on floats. Gave a little performance boost and made the interpolation and rotations look smoother.
9) Added Z-scaling to the stars so they get bigger realistically as they zoom towards the camera
One thing to note: I have yet to do a damn thing to the cube rendering code, which is similarly TERRIBLE. All math is done in software without the SH4 vector/matrix instructions, FIPR and FTRV! Need to fix that!

Anyway, here's the source code if anybody wants to play with it or help optimize it further. Trying to think of other things we can do with it, like maybe add something with post-processing, render-to-texture, or even modifier volumes to make it a better demonstration of the PVR and a better regression test in terms of KOS performance and features...
3rdmix.zip
(1.14 MiB) Downloaded 31 times
These users thanked the author GyroVorbis for the post (total 2):
TwadaIan Robinson
User avatar
Ian Robinson
DC Developer
DC Developer
Posts: 116
Joined: Mon Mar 11, 2019 7:12 am
Has thanked: 209 times
Been thanked: 41 times

Re: Lets Pimp out a 3rdMix and Push the PVR!!!!

Post by Ian Robinson »

GyroVorbis wrote: Tue Jan 30, 2024 3:58 pm So I'm a graphics guy who has done next to zero graphics on DC since rejoining the scene... I was really trying to focus on low-level OS, language, driver, and toolchain kinds of tasks initially, since I'm a bottom-up learner, plus I wanted to do those tasks for myself to learn those areas of computing where I'm less familiar...

Anyway, I've since gotten back into rendering and have made myself more familiar with the PVR API, finally. Anyway, one of the things that has always been on my mind is how 2ndMix needs a new rendition, because it's one of the main examples we test with while developing KOS to look for regressions... and tbh, it's not really using the hardware very well, as it was a very early tech demo. Actually, we got it to dip from 60fps to 30fps with a change the other day when built with -O2 when a bunch of text appears onscreen... So basically that little thing was *barely* hitting 60fps all this time...

So anyway, I dove into the codebase, and started working on it. In the end, I went from <100k polys/sec to more than 1.2 million polys/sec and only had to stop because I was running out of VRAM for storing the vertices, without texture compressing the font... Here's what the end result wound up looking like.

Here's a still screenshot (looks way better in motion):
Screenshot 2024-01-30 15-55-13.png

Here's a video on Twitter/X:
https://x.com/falco_girgis/status/17510 ... 58590?s=20

What did I do?
Falco Girgis wrote: 1) Changed stars from being triangles to being PVR HW sprites
2) Changed text from being triangle-strips to being PVR HW sprites
3) Redid how stars were stored. Rather than putting X, Y, Z coords in 3 separate arrays, packed them into a struct, using only int16_t for each field. Much easier on the cache.
4) Redid all star update math, moving it from being done as integer math to FP math... which made quite a hilarious amount of difference!
5) Cached the HW sprite strip header for text and only resubmitted it when the color was changed for the fade effect.
6) Moved all rendering away from pvr_prim() and towards using the direct rendering API with pvr_dr_commit() for each vertex. Unfortunately I had to do terrible hacks around the KOS PVR API to do this, since the direct rendering API only supports regular 32-bit polygon vertex types, not the 64-bit sprite vertex format that I needed here... Fortunately I wound up being able to do disgusting things to the APIand aliasing pointers to work around it... Will have to think about how to fix this in KOS soon, because there was a pretty significant performance increase using direct rendering.
7) Figured out a much more clever way to handle the coloring on the stars, which was a big one... After moving them to HW sprites, I would still have to resubmit a new sprite header each time the color had to change between sprites. At first I developed an intelligent batching system, which used C's qsort() to sort the stars by Z coord (which is what determined color), to reduce the number of mid-render state changes...

That gave an "okay" amount of performance gain, but what really did it was completely doing away with calculating color per-star in software... How? I realized that exact fade-in effect could be done using the hardware fog effect. Once I figured out how to recreate it (took forever to tweak the fog parameters), I wound up only ever needing to submit a single header for all stars within the scene, since color never changes. That's one header for 7k+ sprites! THAT made the biggest difference in polygon throughput.
8 ) Changed the static, 256-entry integer-based LUT for sin() and cos() to use the actual HW instructions on floats. Gave a little performance boost and made the interpolation and rotations look smoother.
9) Added Z-scaling to the stars so they get bigger realistically as they zoom towards the camera
One thing to note: I have yet to do a damn thing to the cube rendering code, which is similarly TERRIBLE. All math is done in software without the SH4 vector/matrix instructions, FIPR and FTRV! Need to fix that!

Anyway, here's the source code if anybody wants to play with it or help optimize it further. Trying to think of other things we can do with it, like maybe add something with post-processing, render-to-texture, or even modifier volumes to make it a better demonstration of the PVR and a better regression test in terms of KOS performance and features...
3rdmix.zip
Sounds like right up my street i can see a few things lol
These users thanked the author Ian Robinson for the post:
GyroVorbis
Post Reply