Hi,
I try to understand how division operations can be made on SH4.
As I compile in "single mode", it will be float divisions not double.
First, I try to de-compile some C code making division with different type of data.
But assembly code is quite difficult to understand (even using 01 directive).
There is always call to ___udivdi3 / ___floatundisf functions.
But all I want is to see some samples to make divisions using only assembly code of SH4.
I try to Renesas site but there are only general information about registers (FR0..FR15 / FPUL), even in the programming manual there is only description of instructions.
Here is the type of division I want to make in assembly :
float MakeDiv(uint64 n, int d)
{
return n / d;
}
float MakeDiv(int n, float d)
{
return n / d;
}
float MakeDiv(float n, int d)
{
return n / d;
}
Besides, I wonder how to load a floating point (For example let's take PI: 3,1415926535897932384626433832795) in a FRX register in assembly ?
Thanks for any explanation.
About floating point numbers and division instruction
- Newbie
- Insane DCEmu
- Posts: 171
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Sat Jul 27, 2013 1:16 pm
- Has thanked: 0
- Been thanked: 0
-
- DC Developer
- Posts: 968
- Joined: Tue Feb 11, 2003 4:12 pm
- Location: In a Dream
- Has thanked: 5 times
- Been thanked: 5 times
Re: About floating point numbers and division instruction
Integer and floating point(fp) division are completely different to the cpu. The compiler uses internal functions to handle integer division because it has to be done using special step-by-step instructions/algorithms. The compiler calls a function to do this so that a large chunk of code isn't repeated. The fpu on the other hand, has a single instruction that performs division on fp numbers.
In C/C++ mixing fp and integer brings in promotion/demotion of types so integers will usually be promoted to fp using the fpul register.
the following code shows the usual operations when compiled even with O1 (sh-elf-gcc -O1 -S fpudiv.c)
compiles to:
In C/C++ mixing fp and integer brings in promotion/demotion of types so integers will usually be promoted to fp using the fpul register.
the following code shows the usual operations when compiled even with O1 (sh-elf-gcc -O1 -S fpudiv.c)
Code: Select all
int fpudiv( int a, int b )
{
b *= 3.14159;
return a / b;
}
Code: Select all
.file "fpu_div.c"
.text
.little
.text
.align 1
.global _fpdiv
.type _fpdiv, @function
_fpdiv:
lds r4,fpul
float fpul,fr1
mova .L2,r0
fmov.s @r0,fr2
fmul fr2,fr4
fdiv fr4,fr1
ftrc fr1,fpul
rts
sts fpul,r0
.L3:
.align 2
.L2:
.long 1078530000
.size _fpdiv, .-_fpdiv
.ident "GCC: (GNU) 4.7.3"
behold the mind
inspired by Dreamcast
inspired by Dreamcast
Re: About floating point numbers and division instruction
Ok: thanks for reply
First to load directly a constant floating point value like PI in a floating point register,
the floating point value must be converted in the "IEEE754 Single precision 32-bit format".
There are some online converters and source code to do this. Then, simply hard code the value in
the assembly code : https://www.h-schmidt.net/FloatConverter/IEEE754.html
Thus 1078530000 (40490FD0 hex) value in the source code up is the "IEEE754 Single precision 32-bit format" of PI (3,1415926535897932384626433832795).
That's answers to my last question.
Division between integer 32 bits and floating point 32 bits seems to be easy when I understand some things :
is equivalent to (I hope I understand)
But !
For uint64 aka "unsigned long long", I do not understand ...
First example :
is translated to :
I do not understand the code, there are no floating point instructions or registers ...
It calls two functions and use r6 register. I imagine r4 and r5 hold the uint64 and r6 the integer.
But r4 and r5 are cleared by moving r1 and r0 values on them ...
Second example :
translated to :
Here the multiply is between fr12 and fr0 : fr12 value is equal to fr4 value which is the first floating point parameter (d in C code) but what is fr0 ?
Is the "___floatundisf" function a conversion function that transform unsigned long long as floating point numbers ?
So will it takes automatically r4 and r5 values as unsigned long long parameter values to return result in fr0 ?
Thanks for any explanations
First to load directly a constant floating point value like PI in a floating point register,
the floating point value must be converted in the "IEEE754 Single precision 32-bit format".
There are some online converters and source code to do this. Then, simply hard code the value in
the assembly code : https://www.h-schmidt.net/FloatConverter/IEEE754.html
Thus 1078530000 (40490FD0 hex) value in the source code up is the "IEEE754 Single precision 32-bit format" of PI (3,1415926535897932384626433832795).
That's answers to my last question.
Division between integer 32 bits and floating point 32 bits seems to be easy when I understand some things :
So :1) FPUL is used to transfer value from integer world to floating point world
2) FLOAT is the instruction making conversion between integer 32 bits format to "IEEE754 Single precision 32-bit format".
3) Parameters registers in floating point world behaves like integers : FR4 = first parameter, FR5 = second parameter and FR0 is result.
Code: Select all
float TraDiv(int n, float d)
{
return n / d;
}
Code: Select all
lds r4,fpul //r4 (first integer parameter of function) is loaded on FPUL
float fpul,fr0 //then converted in "IEEE754 Single precision 32-bit format" in fr0 using FLOAT
rts
fdiv fr4,fr0 //and finally fro is divided by the first floating point parameter d (in C code)
//which is located on first parameter floating point register fr4 ...
But !
For uint64 aka "unsigned long long", I do not understand ...
First example :
Code: Select all
float MakeDiv(unsigned long long n, int d)
{
return n / d;
}
Code: Select all
__Z7MakeDivyi:
.LFB2:
.cfi_startproc
sts.l pr,@-r15
.cfi_def_cfa_offset 4
.cfi_offset 17, -4
mov r6,r7
shll r7
mov.l .L6,r0
jsr @r0
subc r7,r7
mov r1,r5
mov.l .L7,r1
jsr @r1
mov r0,r4
lds.l @r15+,pr
rts
nop
.L6:
.long ___udivdi3
.L7:
.long ___floatundisf
I do not understand the code, there are no floating point instructions or registers ...
It calls two functions and use r6 register. I imagine r4 and r5 hold the uint64 and r6 the integer.
But r4 and r5 are cleared by moving r1 and r0 values on them ...
Second example :
Code: Select all
float MUL_SPEC(float d, unsigned long long n)
{
return d * n;
}
Code: Select all
__Z8MUL_SPECfy:
.LFB0:
.cfi_startproc
fmov.s fr12,@-r15
.cfi_def_cfa_offset 4
.cfi_offset 37, -4
sts.l pr,@-r15
.cfi_def_cfa_offset 8
.cfi_offset 17, -8
mov.l .L2,r1
jsr @r1
fmov fr4,fr12
fmul fr12,fr0
lds.l @r15+,pr
rts
fmov.s @r15+,fr12
.L2:
.long ___floatundisf
Is the "___floatundisf" function a conversion function that transform unsigned long long as floating point numbers ?
So will it takes automatically r4 and r5 values as unsigned long long parameter values to return result in fr0 ?
Thanks for any explanations
-
- DC Developer
- Posts: 968
- Joined: Tue Feb 11, 2003 4:12 pm
- Location: In a Dream
- Has thanked: 5 times
- Been thanked: 5 times
Re: About floating point numbers and division instruction
You have got the right idea.
r0-r3 and fr0-fr3 are always used for return values so when a function returns, the result will always be in those registers. r4-r7 and fr4-fr7 are always used for parameters so if the parameters for a function we are calling (callee) are already where they need to be, we can just branch to it. Since floating point numbers are always inexact, they are able to hold much larger numbers so even though a uint64 needs two integer registers, it can still fit in one floating point register, unless you need double precision, in which case the result will be in dr0 (i.e. fr0 + fr1).
Integer division is harder than floating point because it has to be exact. This usually requires operating on each and every bit like you normally do on paper with long division. There are a number of different algorithms for doing this, but it does take many instructions so rather than duplicate a long block of code, the compiler just calls the internal integer function for ( long long ), __udivdi3.
In the first example, we are just passing along our parameters to the __udivdi3 function because they are already where they need to be. That function will use them as needed and return the result in r0+r1 so we don't care about them anymore.
Since ( long long ) is stored in two integer registers and fpul register is only 32-bit, converting a ( long long ) needs to be done manually (we can't use lds r0, fpul). Therefore, we pass our result from __udivdi3 which is in r0+r1 to __floatundisf which needs them in r4+r5. When __floatundisf returns, the result will be in fr0 already so we can just return to the function that called us. It's address was saved on the stack using r15 at the start of our function. Every time a function is going to call another function, it needs to save its caller's address on the stack.
So you can see you had the right idea with the second example
It can be hard to follow because the compiler reorders the instructions for optimal performance, but just remember that when a branch is taken on sh4 cpus, the next instruction is usually executed before the branch so:
r0-r3 and fr0-fr3 are always used for return values so when a function returns, the result will always be in those registers. r4-r7 and fr4-fr7 are always used for parameters so if the parameters for a function we are calling (callee) are already where they need to be, we can just branch to it. Since floating point numbers are always inexact, they are able to hold much larger numbers so even though a uint64 needs two integer registers, it can still fit in one floating point register, unless you need double precision, in which case the result will be in dr0 (i.e. fr0 + fr1).
Integer division is harder than floating point because it has to be exact. This usually requires operating on each and every bit like you normally do on paper with long division. There are a number of different algorithms for doing this, but it does take many instructions so rather than duplicate a long block of code, the compiler just calls the internal integer function for ( long long ), __udivdi3.
In the first example, we are just passing along our parameters to the __udivdi3 function because they are already where they need to be. That function will use them as needed and return the result in r0+r1 so we don't care about them anymore.
Since ( long long ) is stored in two integer registers and fpul register is only 32-bit, converting a ( long long ) needs to be done manually (we can't use lds r0, fpul). Therefore, we pass our result from __udivdi3 which is in r0+r1 to __floatundisf which needs them in r4+r5. When __floatundisf returns, the result will be in fr0 already so we can just return to the function that called us. It's address was saved on the stack using r15 at the start of our function. Every time a function is going to call another function, it needs to save its caller's address on the stack.
So you can see you had the right idea with the second example
It can be hard to follow because the compiler reorders the instructions for optimal performance, but just remember that when a branch is taken on sh4 cpus, the next instruction is usually executed before the branch so:
Code: Select all
__Z8MUL_SPECfy:
.LFB0:
.cfi_startproc
fmov.s fr12,@-r15 ! we need to save registers that a caller expects to be untouched.
.cfi_def_cfa_offset 4
.cfi_offset 37, -4
sts.l pr,@-r15 ! we need to save our caller's address before we call another function (callee)
.cfi_def_cfa_offset 8
.cfi_offset 17, -8
mov.l .L2,r1 ! get ready to call __floatundisf
jsr @r1 ! call __floatundisf
fmov fr4,fr12 ! this "delay slot" will be done before we go. Good thing we saved fr12.
fmul fr12,fr0 ! we're back! our result is in fr0 so we multiply and store the answer in fr0 as expected
! If __floatundisf used fr12, it will restore it just like we do below
lds.l @r15+,pr ! restore our caller's address
rts ! return to our caller
fmov.s @r15+,fr12 ! "delay slot" restore the fr12 that we saved before we go
.L2:
.long ___floatundisf
behold the mind
inspired by Dreamcast
inspired by Dreamcast