_asm Anomaly
- AtariOwl
- DCEmu Freak
- Posts: 96
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Fri May 23, 2008 5:57 am
- Has thanked: 0
- Been thanked: 2 times
_asm Anomaly
OK
I have some matrix code in asm which i am running before my render code.
Trouble is that when the matrix code is in, the OP poly list doesnt display.
This would sggest somenow that the registers i am using for the matrices are interfering with the prefecth or the SQ pointer.
i should maybe use mat transform? or something
I have some matrix code in asm which i am running before my render code.
Trouble is that when the matrix code is in, the OP poly list doesnt display.
This would sggest somenow that the registers i am using for the matrices are interfering with the prefecth or the SQ pointer.
i should maybe use mat transform? or something
Re: _asm Anomaly
Post code please. And the matrix functions in kos use assembly, so you shouldn't notice any performance difference.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- BlueCrab
- The Crabby Overlord
- Posts: 5658
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: _asm Anomaly
Not only that, but they're quite optimized, which is difficult to do correctly in many cases.bogglez wrote:Post code please. And the matrix functions in kos use assembly, so you shouldn't notice any performance difference.
- AtariOwl
- DCEmu Freak
- Posts: 96
- Joined: Fri May 23, 2008 5:57 am
- Has thanked: 0
- Been thanked: 2 times
Re: _asm Anomaly
I'm going to use the KOS matrix functions i think
The code i had was asm and i think it was interefering with variables. i should have looked into the stack.
On the jag of course i'd have pushed all the regs onto the stack before i called it and popped them back off at the end.
It actually looks like i'm not messing up the render but the maths... i suspect its because i wasnt preserving the registers when i called my asm
stack is r15?... i'll try it when i have time
or re write it.. decrement is faster than increment no?
The code i had was asm and i think it was interefering with variables. i should have looked into the stack.
Code: Select all
fmov.s @r4+,fr0
fmov.s @r4+,fr1
fmov.s @r4+,fr2
fmov.s @r4+,fr3
ftrv xmtrx,fv0
fmov.s @r4+,fr4
fmov.s @r4+,fr5
fmov.s @r4+,fr6
fmov.s @r4+,fr7
ftrv xmtrx,fv4
fmov.s @r4+,fr8
fmov.s @r4+,fr9
fmov.s @r4+,fr10
fmov.s @r4+,fr11
ftrv xmtrx,fv8
fmov.s @r4+,fr12
fmov.s @r4+,fr13
fmov.s @r4+,fr14
fmov.s @r4+,fr15
ftrv xmtrx,fv12
fschg
fmov dr0,xd0
fmov dr2,xd2
fmov dr4,xd4
fmov dr6,xd6
fmov dr8,xd8
fmov dr10,xd10
fmov dr12,xd12
fmov dr14,xd14
rts
fschg
It actually looks like i'm not messing up the render but the maths... i suspect its because i wasnt preserving the registers when i called my asm
stack is r15?... i'll try it when i have time
or re write it.. decrement is faster than increment no?
- BlueCrab
- The Crabby Overlord
- Posts: 5658
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: _asm Anomaly
If you're doing things in inline assembly, then, yes you'll have to push every register you use onto the stack and pop them back off at the end (or define them as clobbered, which will actually probably be better, since GCC will be able to either avoid using them or preserve them in a better manner than just pushing them all at once, probably).
If you're doing things as an assembly function (i.e, not just using an asm("nop\n\tnop\n\tnop"); type statement in a C file), then you only have to worry about saving the call-preserved floating point registers. Unfortunately, I don't recall what those are off the top of my head, as I rarely write code that deals with floating point stuff on the Dreamcast (and never in assembly).
Pushing things onto the stack is done like this:
And popping things is done like this:
That said, as has already been mentioned, you're not likely to do better than the code that's already in KOS. It's been hand-optimized and verified by a bunch of people over the years.
If you're doing things as an assembly function (i.e, not just using an asm("nop\n\tnop\n\tnop"); type statement in a C file), then you only have to worry about saving the call-preserved floating point registers. Unfortunately, I don't recall what those are off the top of my head, as I rarely write code that deals with floating point stuff on the Dreamcast (and never in assembly).
Pushing things onto the stack is done like this:
Code: Select all
fmov fr12,@-r15
Code: Select all
fmov @r15+,fr12
- AtariOwl
- DCEmu Freak
- Posts: 96
- Joined: Fri May 23, 2008 5:57 am
- Has thanked: 0
- Been thanked: 2 times
Re: _asm Anomaly
ATM i pushed and popped and its working fine now.
Its from a .s file so its probably only certain regs.
I'll try it with KOS too.
The speed... isnt bad atm
we'll see how much it comes down
Its from a .s file so its probably only certain regs.
I'll try it with KOS too.
The speed... isnt bad atm
we'll see how much it comes down
Re: _asm Anomaly
@AtariOwl: As far as I remember the floating point registers aren't preserved at all so you have to push and pop all registers that you use, even if you use asm in a C function (gcc will preserve r0-r7 but not any fr).
If you want to learn about assembly programming I think this is a nice project, but in terms of porting your game it's useless to implement this yourself, so I advise you to just use the KOS routines. Your performance bottleneck will be elsewhere.
@BlueCrab: I think the only possible optimization in those matrices is to add unrelated code during latency gaps that are pointed out in the comments. That's probably what ph3nom did in his renderer assembly code.
And I don't quite remember, but I think I remember the potential to add a function mat_mult_array(mat4, mat4*) that would multiply a whole array of matrices by a single matrix (for scene graphs for example). Maybe such a function already exists.. I don't remember.
If you want to learn about assembly programming I think this is a nice project, but in terms of porting your game it's useless to implement this yourself, so I advise you to just use the KOS routines. Your performance bottleneck will be elsewhere.
@BlueCrab: I think the only possible optimization in those matrices is to add unrelated code during latency gaps that are pointed out in the comments. That's probably what ph3nom did in his renderer assembly code.
And I don't quite remember, but I think I remember the potential to add a function mat_mult_array(mat4, mat4*) that would multiply a whole array of matrices by a single matrix (for scene graphs for example). Maybe such a function already exists.. I don't remember.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
- BlueCrab
- The Crabby Overlord
- Posts: 5658
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: _asm Anomaly
I just looked it up and fr12-fr15 are call-preserved (i.e, they'd have to be backed up in a separate assembly function). All the other floating point registers are call-clobbered with GCC's default setup for SuperH.bogglez wrote:@AtariOwl: As far as I remember the floating point registers aren't preserved at all so you have to push and pop all registers that you use, even if you use asm in a C function (gcc will preserve r0-r7 but not any fr).
So, in summary: If in a separate .s file, you only have to back up fr12, fr13, fr14, and fr15. You're free to use any of the others without backing them up (KOS' matrix functions also show that exact same pattern). If in an asm() statement in a C file, then you need to either back everything up you use or list them as clobbered in the clobber list.
Pretty much.@BlueCrab: I think the only possible optimization in those matrices is to add unrelated code during latency gaps that are pointed out in the comments. That's probably what ph3nom did in his renderer assembly code.
- AtariOwl
- DCEmu Freak
- Posts: 96
- Joined: Fri May 23, 2008 5:57 am
- Has thanked: 0
- Been thanked: 2 times
Re: _asm Anomaly
OK
So i need to preserve my fregs and integer regs when i make each call?
I've preserved the fregs and it assembles and runs, but my angle calls are getting knackered up, so between push and pop stack routines, my variables are essentially useless.
Currently my clobber list as it were is all fregs, no regs.
Its my integer variables that are getting knackered so i better add some regs to my clobber list.
Wait is that even going to help?
if i'm calling several asm routines, and passing different variables o each, theni need to have a push/pop for each call?
Ugh frustrating as its messing up my variables i'm sending to the rotate calls.
Well i've switched to KOS calls for the moment anyway, its not exactly time critical at this point. But it would be nice to switch some time critical segments later, so i will try to figure it out.
So i need to preserve my fregs and integer regs when i make each call?
I've preserved the fregs and it assembles and runs, but my angle calls are getting knackered up, so between push and pop stack routines, my variables are essentially useless.
Currently my clobber list as it were is all fregs, no regs.
Its my integer variables that are getting knackered so i better add some regs to my clobber list.
Wait is that even going to help?
if i'm calling several asm routines, and passing different variables o each, theni need to have a push/pop for each call?
Ugh frustrating as its messing up my variables i'm sending to the rotate calls.
Well i've switched to KOS calls for the moment anyway, its not exactly time critical at this point. But it would be nice to switch some time critical segments later, so i will try to figure it out.