unable to find a register to spill in class 'FP0_REGS'

kazade · Post by **kazade** » Sun Jun 09, 2019 1:23 am

Hi!

Firstly, I know that GCC versions later than 4.7.3 aren't officially supported, but having to work around the partial C++11 support in that version is becoming troublesome so I figured I'd try to update the toolchain.

I've used the GCC 5.2 patches from DreamShell to update both GCC, newlib, and patch a few issues in KOS itself.

Now, here's the problem, the whole of KOS compiles fine, but my GLdc library fails with the following:

Code: Select all

GL/matrix.c: In function 'glhLookAtf2':
GL/matrix.c:381:1: error: unable to find a register to spill in class 'FP0_REGS'
 }
 ^
GL/matrix.c:381:1: error: this is the insn:
(insn 47 46 48 2 (parallel [
            (set (reg/v:SF 64 fr0 [ __x ])
                (fma:SF (reg:SF 69 fr5 [orig:172 D.5380 ] [172])
                    (reg:SF 65 fr1 [orig:196 D.5380 ] [196])
                    (reg:SF 67 fr3 [orig:231 D.5380 ] [231])))
            (clobber (reg:SI 155 fpscr1))
            (use (reg:SI 154 fpscr0))
        ]) GL/matrix.c:350 443 {fmasf4_i}
     (expr_list:REG_DEAD (reg:SF 67 fr3 [orig:231 D.5380 ] [231])
        (expr_list:REG_UNUSED (reg:SI 155 fpscr1)
            (nil))))
GL/matrix.c:381: confused by earlier errors, bailing out
make[1]: *** [GL/matrix.o] Error 1

The code it's trying to compile is here: https://gitlab.com/simulant/GLdc/blob/m ... rix.c#L328

Now, I think this just means that GCC ran out of registers, and the problem stems from the usage of vec3f_normalize and vec3f_sub_normalize in that function, but here are my questions:

1. Is this one of those GCC SH4 regressions I hear so much about, or is vec3f_normalize/sub_normalize doing something wrong that GCC < 5 just ignored?
2. Can anyone explain to me exactly what the problem is?

Post by **BlueCrab** » Sun Jun 09, 2019 10:13 am

That could very well be a regression, because GCC shouldn't run out of floating point registers like that. Basically, unless you're using a lot of variables that have the "register" specifier, you shouldn't run into those types of issues.

kazade · Post by **kazade** » Sun Jun 09, 2019 11:24 am

Aha! This looks like it could be related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54429

Reported in 4.8, still open (but assigned)

mrneo240 · Post by **mrneo240** » Mon Jun 10, 2019 9:00 am

Enjoy.
Rotations come later after lunch maybe.
License:Public Domain

posted for webcrawler, attachment is exactly the same.

diff --git a/kernel/arch/dreamcast/include/dc/vec3f.h b/kernel/arch/dreamcast/include/dc/vec3f.h
index cc92b11..0ebc73c 100644
--- a/kernel/arch/dreamcast/include/dc/vec3f.h
+++ b/kernel/arch/dreamcast/include/dc/vec3f.h
@@ -2,6 +2,7 @@
 
    dc/vec3f.h
    Copyright (C) 2013, 2014 Josh "PH3NOM" Pearson
+   Copyright (C) 2019 HaydenKow
 
 */
 
@@ -13,6 +14,7 @@
     of these.
 
     \author Josh "PH3NOM" Pearson
+    \author HaydenKow aka NeoDC
     \see    dc/matrix.h
 */
 
@@ -44,23 +46,22 @@ typedef struct vec3f {
     \param  w                The result of the calculation.
 */
 #define vec3f_dot(x1, y1, z1, x2, y2, z2, w) { \
-        register float __x __asm__("fr0") = (x1); \
-        register float __y __asm__("fr1") = (y1); \
-        register float __z __asm__("fr2") = (z1); \
-        register float __w __asm__("fr3"); \
-        register float __a __asm__("fr4") = (x2); \
-        register float __b __asm__("fr5") = (y2); \
-        register float __c __asm__("fr6") = (z2); \
-        register float __d __asm__("fr7"); \
         __asm__ __volatile__( \
+                              "fmov %1, fr0\n" \
+                              "fmov %2, fr1\n" \
+                              "fmov %3, fr2\n" \
+                              "fmov %4, fr4\n" \
+                              "fmov %5, fr5\n" \
+                              "fmov %6, fr6\n" \
                               "fldi0 fr3\n" \
                               "fldi0 fr7\n" \
-                              "fipr    fv4,fv0" \
-                              : "+f" (__w) \
-                              : "f" (__x), "f" (__y), "f" (__z), "f" (__w), \
-                              "f" (__a), "f" (__b), "f" (__c), "f" (__d) \
+                              "fipr    fv4,fv0\n" \
+                              "fmov fr4, %0\n" \
+                              : "+f" (w) \
+                              : "f" (x1), "f" (y1), "f" (z1), \
+                              "f" (x2), "f" (y2), "f" (z2) \
+                              : "fr3", "fr7" \
                             ); \
-        w = __w; \
     }
 
 /** \brief  Macro to return scalar Euclidean length of a 3d vector.
@@ -75,20 +76,19 @@ typedef struct vec3f {
     \param  w               The result of the calculation.
 */
 #define vec3f_length(x, y, z, w) { \
-        register float __x __asm__("fr0") = (x); \
-        register float __y __asm__("fr1") = (y); \
-        register float __z __asm__("fr2") = (z); \
-        register float __w __asm__("fr3"); \
         __asm__ __volatile__( \
+                              "fmov %1, fr0\n" \
+                              "fmov %2, fr1\n" \
+                              "fmov %3, fr2\n" \
                               "fldi0 fr3\n" \
                               "fipr  fv0,fv0\n" \
                               "fsqrt fr3\n" \
-                              : "+f" (__w) \
-                              : "f" (__x), "f" (__y), "f" (__z), "f" (__w) \
+                              : "+f" (w) \
+                              : "f" (x), "f" (y), "f" (z), "0f" (w) \
                             ); \
-        w = __w; \
     }
 
+
 /** \brief  Macro to return the Euclidean distance between two 3d vectors.
 
     This macro is an inline assembly operation using the SH4's fast
@@ -104,18 +104,16 @@ typedef struct vec3f {
     \param  w                The result of the calculation.
 */
 #define vec3f_distance(x1, y1, z1, x2, y2, z2, w) { \
-        register float __x  __asm__("fr0") = (x2-x1); \
-        register float __y  __asm__("fr1") = (y2-y1); \
-        register float __z  __asm__("fr2") = (z2-z1); \
-        register float __w  __asm__("fr3"); \
         __asm__ __volatile__( \
-                       "fldi0 fr3\n" \
+                              "fmov %1, fr0\n" \
+                              "fmov %2, fr1\n" \
+                              "fmov %3, fr2\n" \
+                              "fldi0 fr3\n" \
                               "fipr  fv0,fv0\n" \
                               "fsqrt fr3\n" \
                               : "+f" (__w) \
-                              : "f" (__x), "f" (__y), "f" (__z), "f" (__w) \
+                              : "f" (x2-x1), "f" (y2-y1), "f" (z2-z1), "0f" (w) \
                             ); \
-        w = __w; \
     }
 
 /** \brief  Macro to return the normalized version of a vector.
@@ -130,20 +128,19 @@ typedef struct vec3f {
     \param  z               The Z coordinate of vector.
 */
 #define vec3f_normalize(x, y, z) { \
-        register float __x __asm__("fr0") = x; \
-        register float __y __asm__("fr1") = y; \
-        register float __z __asm__("fr2") = z; \
         __asm__ __volatile__( \
+                              "fmov %3, fr0\n" \
+                              "fmov %4, fr1\n" \
+                              "fmov %5, fr2\n" \
                               "fldi0 fr3\n" \
                               "fipr  fv0,fv0\n" \
                               "fsrra fr3\n" \
                               "fmul  fr3, fr0\n" \
                               "fmul  fr3, fr1\n" \
                               "fmul  fr3, fr2\n" \
-                              : "=f" (__x), "=f" (__y), "=f" (__z) \
-                              : "0" (__x), "1" (__y), "2" (__z) \
+                              : "+f" (x), "+f" (y), "+f" (z) \
+                              : "0f" (x), "1f" (y), "2f" (z) \
                               : "fr3" ); \
-        x = __x; y = __y; z = __z; \
     }
 
 /** \brief  Macro to return the normalized version of a vector minus another
@@ -164,22 +161,20 @@ typedef struct vec3f {
     \param  z3               The Z coordinate of output vector.
 */
 #define vec3f_sub_normalize(x1, y1, z1, x2, y2, z2, x3, y3, z3) { \
-        register float __x __asm__("fr0") = x1 - x2; \
-        register float __y __asm__("fr1") = y1 - y2; \
-        register float __z __asm__("fr2") = z1 - z2; \
         __asm__ __volatile__( \
+                              "fmov %3, fr0\n" \
+                              "fmov %4, fr1\n" \
+                              "fmov %5, fr2\n" \
                               "fldi0 fr3\n" \
                               "fipr  fv0,fv0\n" \
                               "fsrra fr3\n" \
                               "fmul  fr3, fr0\n" \
                               "fmul  fr3, fr1\n" \
                               "fmul  fr3, fr2\n" \
-                              : "=f" (__x), "=f" (__y), "=f" (__z) \
-                              : "0" (__x), "1" (__y), "2" (__z) \
+                              : "=f" (x3), "=f" (y3), "=f" (z3) \
+                              : "0f" (x1 - x2), "f" (y1 - y2), "2f" (z1 - z2) \
                               : "fr3" ); \
-        x3 = __x; y3 = __y; z3 = __z; \
     }
-
 /** \brief  Macro to rotate a vector about its origin on the x, y plane.
 
     This macro is an inline assembly operation using the SH4's fast

GCC 5.2 seems to produce fine binaries, these patches give expected behavior

kazade · Post by **kazade** » Mon Jun 10, 2019 12:37 pm

A bit more context: the way the current asm is written can apparently lead to weirdness with optimizations enabled, the above patch makes it less brittle (mrneo can probably explain better).

It would be great if this patch could be applied upstream

Post by **BlueCrab** » Mon Jun 10, 2019 8:46 pm

There are issues with the patch that has been presented here... First off, it clobbers registers without informing the compiler, which will break things in interesting ways later on. Second, it could end up causing significantly more fmov instructions to need to be output in the final assembly than what the code should require. When optimizing, with the way the code is in KOS at the moment, the optimizer can take into account that certain things have to be in certain registers, whereas with the code after this patch, it will not do that any more (and the added fmov instructions will always be output, even if they're not needed). This is especially important if multiple operations are strung together in your graphic pipeline (whereas if you're only using one of those operations, you probably won't see any harm from using the patch -- if you fix the clobbered register issues).

I'm pretty sure this is really a GCC bug that we shouldn't be working around in hacky/quite possibly slower ways like this patch does. It's bugs like this (optimization bugs and the like) that have kept us back on earlier versions of GCC rather than moving forward to newer versions.

Thus, I will not upstream this patch, at least not in it's current form. I may try to look at it sometime in the nearish future to see if there's a better way to solve this issue...

mrneo240 · Post by **mrneo240** » Tue Jun 11, 2019 6:42 am

Second, it could end up causing significantly more fmov instructions to need to be output in the final assembly than what the code should require.

Hoping 3 values happen to be in order in 3 registers is not going to be common. I would doubt it happens often.

In your opinion bluecrab will you ever move Kos away from 4.7.3? Even in 5 years time?

Post by **BlueCrab** » Tue Jun 11, 2019 10:27 am

mrneo240 wrote: ↑Tue Jun 11, 2019 6:42 amHoping 3 values happen to be in order in 3 registers is not going to be common. I would doubt it happens often.

Not if you run multiple passes of the various macros in that file with the same parameters it wouldn't be. Sure, on a first pass, it'd be unlikely that everything would be where it should be, but on the next call, they will be, most likely. Plus, as I said, the optimizer can see ahead that x should be in fr0 with the way the code is and optimize it so that it is there.

In your opinion bluecrab will you ever move Kos away from 4.7.3? Even in 5 years time?

If a newer compiler version comes along that doesn't have serious bugs in the SH4 code output, then sure. However, that really hasn't been the case since the 4.7.x line. There's not really a compelling reason (in my opinion) to move to a broken compiler that will produce broken code, just for the sake of having a newer compiler. Yes, new features are nice, but bad code is not. Until someone fixes GCC, I don't see a good reason to move to a new version. Unfortunately, it seems that the GCC developers don't think the bugs that exist are important enough to fix or that there's just nobody doing GCC development that cares about fixing SuperH support at this point, and I don't know anyone else around here with the knowledge of GCC internals to try to fix it ourselves...

mrneo240 · Post by **mrneo240** » Tue Jun 11, 2019 6:57 pm

Hints on where to find bad code being produced?

Post by **BlueCrab** » Tue Jun 11, 2019 9:21 pm

Look through all the bug reports on the GCC bug tracker for results matching "SH" or "SuperH", especially those that also include the term "regression" in the title. For reference, here's the list of all the bugs tagged "[SH]" right now on the bug tracker: https://gcc.gnu.org/bugzilla/buglist.cg ... h=%5BSH%5D

Mind you, not all of those bugs are regressions or bad code being produced -- several are potential improvements for optimization. However, the number of [7/8/9/10 Regression] tagged bugs is quite concerning. The fact that there are several bugs that had been reported in GCC 7.x which are still present in the master branch today kinda shows how little work is being put into fixing the issues, unfortunately.

kazade · Post by **kazade** » Wed Jun 12, 2019 1:06 am

So, a bit more info on this particular regression in 5.2..

- it only happens when -fexpensive-optimisations is enabled
- it only affects the call to vec3f_normalise, sub_normalise is weirdly fine
- it has nothing to do with the registers being marked volatile
- it only happens in this particular place, I'm suspicious of the static float matrix array in the same function as a cause

Other than this particular error, everything else seems to compile and run fine. If we can bisect the problematic commit them maybe we can get the issue resolved quickly and backport the patch.

nymus · Post by **nymus** » Wed Jun 12, 2019 2:41 am

Please correct me if I'm wrong regarding gcc's SuperH status... I think some of these have been discussed in these forums before...

- SuperH support was dropped some time ago?
- SuperH's lack of atomic operations limits the extent to which C++11 and above can be supported?

kazade · Post by **kazade** » Wed Jun 12, 2019 4:37 am

- SuperH is still (supposedly) supported on GCC, but there's no one really maintaining it.
- IIRC the atomic stuff in C++11 can be emulated in the standard library with mutexes etc. if the atomic instructions aren't available and I vaguely remember an issue that GCC's lib doesn't do that, but there was a bug to do so.

unable to find a register to spill in class 'FP0_REGS'

unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'

Re: unable to find a register to spill in class 'FP0_REGS'