About floating point numbers and division instruction

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

About floating point numbers and division instruction

Post by Newbie »

Hi,


I try to understand how division operations can be made on SH4.

As I compile in "single mode", it will be float divisions not double.

First, I try to de-compile some C code making division with different type of data.
But assembly code is quite difficult to understand (even using 01 directive).
There is always call to ___udivdi3 / ___floatundisf functions.

But all I want is to see some samples to make divisions using only assembly code of SH4.

I try to Renesas site but there are only general information about registers (FR0..FR15 / FPUL), even in the programming manual there is only description of instructions.

Here is the type of division I want to make in assembly :

float MakeDiv(uint64 n, int d)
{
return n / d;
}

float MakeDiv(int n, float d)
{
return n / d;
}

float MakeDiv(float n, int d)
{
return n / d;
}

Besides, I wonder how to load a floating point (For example let's take PI: 3,1415926535897932384626433832795) in a FRX register in assembly ?

Thanks for any explanation.
nymus
DC Developer
DC Developer
Posts: 968
Joined: Tue Feb 11, 2003 4:12 pm
Location: In a Dream
Has thanked: 5 times
Been thanked: 5 times

Re: About floating point numbers and division instruction

Post by nymus »

Integer and floating point(fp) division are completely different to the cpu. The compiler uses internal functions to handle integer division because it has to be done using special step-by-step instructions/algorithms. The compiler calls a function to do this so that a large chunk of code isn't repeated. The fpu on the other hand, has a single instruction that performs division on fp numbers.

In C/C++ mixing fp and integer brings in promotion/demotion of types so integers will usually be promoted to fp using the fpul register.

the following code shows the usual operations when compiled even with O1 (sh-elf-gcc -O1 -S fpudiv.c)

Code: Select all

int fpudiv( int a, int b )
{
    b *= 3.14159;
    return a / b;
}
compiles to:

Code: Select all

	.file	"fpu_div.c"
	.text
	.little
	.text
	.align 1
	.global	_fpdiv
	.type	_fpdiv, @function
_fpdiv:
	lds	r4,fpul
	float	fpul,fr1
	mova	.L2,r0
	fmov.s	@r0,fr2
	fmul	fr2,fr4
	fdiv	fr4,fr1
	ftrc	fr1,fpul
	rts	
	sts	fpul,r0
.L3:
	.align 2
.L2:
	.long	1078530000
	.size	_fpdiv, .-_fpdiv
	.ident	"GCC: (GNU) 4.7.3"
behold the mind
inspired by Dreamcast
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: About floating point numbers and division instruction

Post by Newbie »

Ok: thanks for reply :)

First to load directly a constant floating point value like PI in a floating point register,
the floating point value must be converted in the "IEEE754 Single precision 32-bit format".
There are some online converters and source code to do this. Then, simply hard code the value in
the assembly code : https://www.h-schmidt.net/FloatConverter/IEEE754.html

Thus 1078530000 (40490FD0 hex) value in the source code up is the "IEEE754 Single precision 32-bit format" of PI (3,1415926535897932384626433832795).

That's answers to my last question.

Division between integer 32 bits and floating point 32 bits seems to be easy when I understand some things :
1) FPUL is used to transfer value from integer world to floating point world
2) FLOAT is the instruction making conversion between integer 32 bits format to "IEEE754 Single precision 32-bit format".
3) Parameters registers in floating point world behaves like integers : FR4 = first parameter, FR5 = second parameter and FR0 is result.
So :

Code: Select all

float TraDiv(int n, float d)
{
	return n / d;
}
is equivalent to (I hope I understand)

Code: Select all

	lds	r4,fpul   //r4 (first integer parameter of function) is loaded on FPUL 
	float	fpul,fr0  //then converted in "IEEE754 Single precision 32-bit format" in fr0 using FLOAT  
	rts	
	fdiv	fr4,fr0   //and finally fro is divided by the first floating point parameter d (in C code) 
                           //which is located on first parameter floating point register fr4 ...

But !
For uint64 aka "unsigned long long", I do not understand ...

First example :

Code: Select all

float MakeDiv(unsigned long long n, int d)
{
	return n / d;
}
is translated to :

Code: Select all

__Z7MakeDivyi:
.LFB2:
	.cfi_startproc
	sts.l	pr,@-r15
	.cfi_def_cfa_offset 4
	.cfi_offset 17, -4
	mov	r6,r7
	shll	r7
	mov.l	.L6,r0
	jsr	@r0
	subc	r7,r7
	mov	r1,r5
	mov.l	.L7,r1
	jsr	@r1
	mov	r0,r4
	lds.l	@r15+,pr
	rts	
	nop	
	
.L6:
	.long	___udivdi3
.L7:
	.long	___floatundisf

I do not understand the code, there are no floating point instructions or registers ...
It calls two functions and use r6 register. I imagine r4 and r5 hold the uint64 and r6 the integer.
But r4 and r5 are cleared by moving r1 and r0 values on them ...

Second example :

Code: Select all

float MUL_SPEC(float d, unsigned long long n)
{
	return d * n;
}
translated to :

Code: Select all

__Z8MUL_SPECfy:
.LFB0:
	.cfi_startproc
	fmov.s	fr12,@-r15
	.cfi_def_cfa_offset 4
	.cfi_offset 37, -4
	sts.l	pr,@-r15
	.cfi_def_cfa_offset 8
	.cfi_offset 17, -8
	mov.l	.L2,r1
	jsr	@r1
	fmov	fr4,fr12
	fmul	fr12,fr0
	lds.l	@r15+,pr
	rts	
	fmov.s	@r15+,fr12	
	
.L2:
	.long	___floatundisf
Here the multiply is between fr12 and fr0 : fr12 value is equal to fr4 value which is the first floating point parameter (d in C code) but what is fr0 ?

Is the "___floatundisf" function a conversion function that transform unsigned long long as floating point numbers ?

So will it takes automatically r4 and r5 values as unsigned long long parameter values to return result in fr0 ?

Thanks for any explanations :)
nymus
DC Developer
DC Developer
Posts: 968
Joined: Tue Feb 11, 2003 4:12 pm
Location: In a Dream
Has thanked: 5 times
Been thanked: 5 times

Re: About floating point numbers and division instruction

Post by nymus »

You have got the right idea. :)

r0-r3 and fr0-fr3 are always used for return values so when a function returns, the result will always be in those registers. r4-r7 and fr4-fr7 are always used for parameters so if the parameters for a function we are calling (callee) are already where they need to be, we can just branch to it. Since floating point numbers are always inexact, they are able to hold much larger numbers so even though a uint64 needs two integer registers, it can still fit in one floating point register, unless you need double precision, in which case the result will be in dr0 (i.e. fr0 + fr1).

Integer division is harder than floating point because it has to be exact. This usually requires operating on each and every bit like you normally do on paper with long division. There are a number of different algorithms for doing this, but it does take many instructions so rather than duplicate a long block of code, the compiler just calls the internal integer function for ( long long ), __udivdi3.

In the first example, we are just passing along our parameters to the __udivdi3 function because they are already where they need to be. That function will use them as needed and return the result in r0+r1 so we don't care about them anymore.

Since ( long long ) is stored in two integer registers and fpul register is only 32-bit, converting a ( long long ) needs to be done manually (we can't use lds r0, fpul). Therefore, we pass our result from __udivdi3 which is in r0+r1 to __floatundisf which needs them in r4+r5. When __floatundisf returns, the result will be in fr0 already so we can just return to the function that called us. It's address was saved on the stack using r15 at the start of our function. Every time a function is going to call another function, it needs to save its caller's address on the stack.

So you can see you had the right idea with the second example ;)

It can be hard to follow because the compiler reorders the instructions for optimal performance, but just remember that when a branch is taken on sh4 cpus, the next instruction is usually executed before the branch so:

Code: Select all

__Z8MUL_SPECfy:
.LFB0:
   .cfi_startproc
   fmov.s   fr12,@-r15 ! we need to save registers that a caller expects to be untouched. 
   .cfi_def_cfa_offset 4
   .cfi_offset 37, -4
   sts.l   pr,@-r15 ! we need to save our caller's address before we call another function (callee)
   .cfi_def_cfa_offset 8
   .cfi_offset 17, -8
   mov.l   .L2,r1 ! get ready to call __floatundisf
   jsr   @r1 ! call __floatundisf
   fmov   fr4,fr12 ! this "delay slot" will be done before we go. Good thing we saved fr12.
   fmul   fr12,fr0 ! we're back! our result is in fr0 so we multiply and store the answer in fr0 as expected
   ! If __floatundisf used fr12, it will restore it just like we do below
   lds.l   @r15+,pr ! restore our caller's address
   rts   ! return to our caller
   fmov.s   @r15+,fr12   ! "delay slot" restore the fr12 that we saved before we go
   
.L2:
   .long   ___floatundisf
behold the mind
inspired by Dreamcast
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: About floating point numbers and division instruction

Post by Newbie »

I would only add that floatundisf is a part of glibc.
All is said about this topic :)
Thanks.
Post Reply