Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16375
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 104 times
Been thanked: 91 times
Contact:

Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by |darc| »

With GCC 13.1.0's imminent release upon us, I figured it was a good time to do some compiler/toolchain performance comparisons. It's time for a showdown!

I hacked together some scripts to automate testing of 24 different compiler configurations.

Hardware used:
US Dreamcast model 1 with 32MB mod, DreamBoot BIOS, loading tests from embedded dcload-ip

Software
KallistiOS Git revision 98f913d
modified version of pvrmark example in kos/examples/dreamcast/pvr/pvrmark -- modification done to avoid inconsistencies and issues pvrmark has.
pvrmark goes through several phases. This modification waits until pvrmark reaches PHASE_FINAL, then waits until 5 more FPS/PPS debug messages (we'll call these "cycles") are printed out (approx every 5 seconds). The software then collects the next 25 polygons-per-second scores from the next 25 cycles and then averages them. Sometimes a cycle might spike up (and print >90fps, for example) or dip down (to 10-30fps), if this happens that cycle is dropped from being added to the results and the next cycle is counted instead.
All 24 tests were ran and data was collected, then it was done all over again: the system was rebooted and the test suite was ran a second time. And when that was finished, a third time. All in all, it took about 4.5 hours for the Dreamcast to run 3 passes of these tests.

Compiler setups tested:
KOS "testing" -- gcc-13.1.0 with Newlib 4.3.0 and binutils 2.40 (PR in github to be merged soon, since the final tarball hasn't been uploaded yet, we are using GCC 13.1.0 RC2)
KOS "testing" -- gcc-12.2.0 with Newlib 4.3.0 and binutils 2.40
KOS "stable" -- gcc-9.3.0 with Newlib 3.3.0 and binutils 2.34
KOS "legacy" -- gcc-4.7.4 with Newlib 2.0.0 and bintuils 2.34

Flags tested:
-O1

Code: Select all

-O1 -fomit-frame-pointer
-O2

Code: Select all

-O2 -fomit-frame-pointer
-O3

Code: Select all

-O3 -fomit-frame-pointer
-Os

Code: Select all

-Os -fomit-frame-pointer
iansflags - When doing initial discussion about pvrmark performance, Ian Micheal built against these flags, so I included them in this test suite

Code: Select all

-Wall -g -ml -m4-single-only -Os -fomit-frame-pointer  -ffast-math -fno-strict-aliasing -fwrapv
-O3-fipa-pta - Suggested by pcercuei

Code: Select all

O3 -fipa-pta -fomit-frame-pointer

Results

While the top 3 spots shuffled around in the different tests, a variant of GCC 13.1.0 was always on top. On average, GCC 13.1.0 with -O3-fipa-pta performed the best. Regardless, 4.7.4 had a very impressive showing at -O2.
Screenshot_20230424_201231.png
Binary sizes: Strictly speaking of bin sizes, GCC 13.1.0 wins with -Os. Great, right?
Screenshot_20230424_201752.png
Comparing binary sizes and performance: Well, not so great for GCC 13.1.0 with -Os. Despite its excellent low bin size, performance is near the bottom.
That being said, comparing the top of the performance charts, GCC 13.1.0 at -O3 or -O3-fipa-pta not only edge out 4.7.4-O2 for the performance crown, but they also shave off quite a good chunk of space from the 4.7.4 top performers.
If you really need to shave off bin size, GCC 12.2.0 with -Os or iansflags might be the way to go, as you sacrifice a little bit of a performance for a pretty sizeable reduction in binary size.
Screenshot_20230424_202609.png
Beyond
  • I have no idea if pvrmark is truly a good benchmark or if it's representative of the typical work done by homebrew software on a Dreamcast. I figured it was a decent starting point once I got it to (relatively) consistently produce results I could compare. My scripts are adaptable and it would be trivial to replace pvrmark with a different benchmark as long as the benchmark program prints out a score to the console and exits back to dcload.
  • I could try more configurations. Does anyone have a set of CFLAGS they think can beat out these top configs? I'm more than willing to go a round two with this test to see how hard we can push this console with our available tools.
These users thanked the author |darc| for the post (total 4):
QuzarSWATGyroVorbisIan Robinson
It's thinking...
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by GyroVorbis »

Fantastic work!

I've run the libGimbal bench marks I did for GCC12, comparing between 12 and 13 here: https://github.com/KallistiOS/KallistiO ... 1521057675

I might see if I can take some of the more interesting data points here and run them through my tests as well to get some big app-type runtime information... eventually I want to make the test suites use the performance counters too.
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16375
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 104 times
Been thanked: 91 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by |darc| »

I was asked to run with -flto. This is a single pass of everything with -flto added to a charge with the score3 pass of the previous run.

Very interesting results!!
Screenshot_20230425_125249.png
These users thanked the author |darc| for the post:
GyroVorbis
It's thinking...
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16375
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 104 times
Been thanked: 91 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by |darc| »

Dialing things in.

Here are the top tiers:
Screenshot_20230425_173730.png
Screenshot_20230425_173730.png (31.04 KiB) Viewed 495 times
the newly added flags are:
fipa-pita:

Code: Select all

-fipa-pta
fipa-cp-clone:

Code: Select all

-fipa-cp-clone
rbsimple:

Code: Select all

-freorder-blocks-algorithm=simple

With -flto, 4.7.4 takes the top spot by a nice little margin, but 12.2.0 and 13.1.0 aren't too far behind.
The margin of error for these readings can be around +/- 1,000 points or more, so really most anything you see around 389xxx and 399xxx are all tied. I left two separate runs of gcc12.2.0-O3-flto on the chart to demonstrate this: one scored 399136, and another 398249.
So this mostly demonstrates that although 4.7.4 cannot yet be matched when tweaked, you can get 95% of that performance with any of 9.3.0, 12.2.0, and 13.1.0 if you tweak your flags correctly.

I also tried to compile newlib with LTO but it crashed the benchmark.

Bin size at this high performing tier:
Screenshot_20230425_183016.png
13.1.0 is the clear winner when it comes to bin size.


Here's the full list so far:
Screenshot_20230425_173639.png
These users thanked the author |darc| for the post (total 2):
Ian RobinsonGyroVorbis
It's thinking...
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16375
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 104 times
Been thanked: 91 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by |darc| »

Final data, for today at least. Readded binsize back into the mix. Here's the same set of data sorted both ways.
Screenshot_20230425_193406.png
Screenshot_20230425_193448.png
if you care about binsize then gcc12.2.0-iansflags-flto or gcc12.2.0-Os-flto are almost the best you can get in terms of binsize and have great performance
if binsize means absolutely nothing and you don't care about any modern language features, then go with 4.7.4-O3-flto or 4.7.4-O3-fipa-pta-flto
everyone else probably could just use gcc13.1.0-O2-fipa-cp-clone-rbsimple-flto as a top performer with a sane bin size and all modern language stuff
These users thanked the author |darc| for the post (total 2):
Ian RobinsonGyroVorbis
It's thinking...
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by GyroVorbis »

This needs to be a nice wiki article. Fantastic work.
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by GyroVorbis »

Okay, if we're gonna be hyping up the toolchain, I'll conduct the hype train for initial C23 and C++23 support.

This is some C++23 crazy contrived bullshit that you can now do, verified on DC:

Code: Select all

struct StaticMonstrosity {
    template<typename... Args>
    constexpr static std::size_t operator[](Args&&... args) { 
        return sizeof...(args);
    }
};

int main(int argc, char **argv) {
    printf("%zu\n", StaticMonstrosity{}[1, "lolol", -12.3434, false]);
    return 0;
}
A constexpr static variadically templated overloaded multidimensional subscript operator. lol.

C23 is actually a pretty SERIOUS leap for how conservative C has always been... This is now valid:

Code: Select all

auto i = 3;
typeof(i) j = 4;

Code: Select all

   typedef struct { 
      bool      k;
      uintptr_t l;
   } SimpleStruct;

   struct ComplexStruct {
      int           i;
      float         j;
      const char*   p;
      SimpleStruct* pInner;
   }* const pStruct = &(const struct ComplexStruct) {
      .i = 23, 
      .j = -124.3f, 
      .p = "lololol",
      .pInner = &(static SimpleStruct) {
         .k = true, 
         .l = 0xdeadbabe
      }
   };
I'll be testing other stdlib stuff as I get the chance and adding things that don't work to the list of toolchain features I'd like to eventually get implemented for us... Things like C++ atomics, std::filesystem, etc.
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16375
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 104 times
Been thanked: 91 times
Contact:

Re: Imminent GCC 13.1.0 Release: Dreamcast Toolchain Showdown

Post by |darc| »

13.1.0 is now officially out and available in KOS.
These users thanked the author |darc| for the post:
GyroVorbis
It's thinking...
Post Reply