GCC 13.2.0 coming soon: compiler info and another performance showdown

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16378
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 109 times
Been thanked: 91 times
Contact:

GCC 13.2.0 coming soon: compiler info and another performance showdown

Post by |darc| »

GCC 13.2.0 should be releasing on July 27, so I wanted to generate some more data on GCC performance. If you're reading this but haven't read my previous performance thread here, you might wanna do that. I'm using pvrmark again as a benchmark.

Besides just messing around with 13.2.0, I've created KOS patches for, built, and tested many configurations of GCC over the past couple months: 4.7.4, 4.9.4, 9.3.0, 9.4.0, 9.5.0, 10.1.0, 10.2.0, 10.3.0, 10.4.0, 10.5.0, 11.1.0, 11.2.0, 11.3.0, 11.4.0, 12.1.0, 12.2.0, 12.3.0, 13.1.0, 13.2.0, gcc-rs from git and gcc 14 from git. I also tested different versions of newlib with different compiler versions as well. Patches are currently available in the KOS github gccdev branch, and at some point in the near future I'll PR those to the main KOS.

Given that I tested a couple dozen toolchain builds with a couple dozen flag sets, with and without LTO, etc., this required compililng several hundred ELF binaries to test. Compiling them isn't so much of a problem, but looping through pvrmark requires a few minutes per build, so in the interest of not having to run my Dreamcast for several days straight, I reduced the amount of iterations per test while running pvrmark, compared to the last benchmark round. Like in the last thread, keep this in mind when comparing results against one another, as you can sometimes get large swings of 1000-3000 or more between runs of the same build. Some builds were only run once, some of the more interesting points (like the high end scores) I ran multiple times and averaged out, so I could at least attempt to get a more accurate representation.

Here are my observations so far:
  • GCC 4.7.4, the KallistiOS "legacy" configuration, is still useful to keep around for old code, as it will compile things that modern GCC versions will not. A lot of old code in KOS and kos-ports had to be updated to compile properly on modern GCC. Most of that code was poor code, showing that GCC has become more strict over time.
  • As mentioned in the previous thread, GCC 4.7.4 at O3 with LTO was the fastest compiler configuration of all of them, though with a tremendous binary size. Using it with LTO seems to choke on things more than modern GCC, and in fact, right now KallistiOS won't build with LTO under 4.7.4. I had to patch KOS manually to run this benchmark. Without LTO enabled, it's no longer the fastest option and isn't worth using unless, like mentioned above, you need it for old code compatibility reasons. As it ages, using it can bring up compatibility issues as well, for example building GCC 4.7.4 on macOS is broken right now (and was broken on *nix a while back before KOS added a compatibility patch to make it build).
  • GCC 4.9.4 was of interest because it supports fast-math and is available in Compiler Explorer, but it's too buggy and generates screwed up code, so I abandoned working with it at all.
  • Comparing GCC 9.3.0 toolchain with newlib 3.3 and binutils 2.34 vs. GCC 9.3.0 with newlib 4.3.0 and binutils 2.40 didn't really produce anything of interest. I don't think it's worth messing around with anything other than the latest.
  • Comparing within a GCC generation, e.g. 9.3.0 vs. 9.5.0, 10.1.0 vs. 10.5.0, 11.1.0 vs. 11.4.0, etc. does show some trends up or down in performance, but it's always pretty negligible, so it's not worth doing anything more with... there's no magical obscure point release with uber performance hidden away. Might as well use the most up to date version within a generation. Therefore, to keep things simple, I've omitted all the data for these point releases in my charts and just left the interesting bits.
  • Without LTO, GCC 9 and 10 are at an obvious disadvantage. With the best configurations, 9.x is about 2% slower than 10.x, and 10.x is about 3% slower than 11.4.0. 11.4.0 through 13.2.0 are all about the same.
GCC performance without LTO
Screenshot_20230721_184602.png
  • The story is different with LTO. With LTO enabled, the GCC 10 series has an obvious reproducible speed advantage over 9/11/12/13, but even still, the best 10.5.0 score is only a 1.39% increase in speed over the best 13.2.0 score.
GCC performance including LTO
Screenshot_20230721_184828.png
(Side note: O3-fipa-pta-rbsimple and O3-fipa-pta-fipa-cp-clone-rbsimple should be redundant considering O3 already includes fipa-cp-clone. Yet the results can be way off between them in these charts. That demonstrates the difficulty in using this number as a direct comparison between two data points. A more consistent benchmark method is needed. Though a little bit odd too that some scores ended up exactly the same -- that wasn't a mixup on my part)
Also: the gray X boxes for 4.7.4 are because 4.7.4 doesn't support -freorder-blocks-algorithm=simple

  • I didn't really do much comparison of bin size vs. performance this time, but 13.x's -Os flags produce the smallest bins.
  • I think that's everything....
The gist of it...
The fastest 4.7.4-LTO configuration is 4.11% faster than the fastest 13.2.0-LTO configuration (most up-to-date).
The fastest 10.5.0-LTO configuration is 1.39% faster than the fastest 13.2.0-LTO configuration (most up-to-date).
If those speed differences are worth it to you, and in the case of 4.7.4 you don't mind the huge bin size, you can pursue those compilers, but otherwise, without LTO, 4.7.4 and 10.5.0 are no good, just stick to the latest 13.2.0.

Next...
I wanted to test with and without fast-math functions, but pvrmark doesn't really do anything to emit those instructions, really showing the limitations of using pvrmark for something like this. The next step in this journey is to create a benchmark that more accurately represents an actual workload. In chat we discussed using Quake demos or maybe Harlequest dev edition, which I backed.
These users thanked the author |darc| for the post (total 4):
GyroVorbisIan RobinsonBB HoodBasil
It's thinking...
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16378
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 109 times
Been thanked: 91 times
Contact:

Re: GCC 13.2.0 coming soon: compiler info and another performance showdown

Post by |darc| »

GCC 13.2.0 is now released, and it is available to build from the toolchain script in KOS.
These users thanked the author |darc| for the post:
BB Hood
It's thinking...
|darc|
DCEmu Webmaster
DCEmu Webmaster
Posts: 16378
Joined: Wed Mar 14, 2001 6:00 pm
Location: New Orleans, LA
Has thanked: 109 times
Been thanked: 91 times
Contact:

Re: GCC 13.2.0 coming soon: compiler info and another performance showdown

Post by |darc| »

I tested all KOS examples under the newest toolchain and I've found no regressions since 9.3.0. Therefore, the latest toolchain with GCC 13.2.0, Binutils 2.41, Newlib 4.3.0, and GDB 13.2 has been upgraded to the "stable" configuration in KOS. Additionally config files are available in the dc-chain/config directory for building toolchains based on GCC 4.7.4, 9.3.0, 10.5.0, 11.4.0, 12.3.0, 13.2.0, or 14.0.1.
It's thinking...
Post Reply