_asm Anomaly

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
User avatar
AtariOwl
DCEmu Freak
DCEmu Freak
Posts: 96
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Fri May 23, 2008 5:57 am
Has thanked: 0
Been thanked: 2 times

_asm Anomaly

Post by AtariOwl »

OK

I have some matrix code in asm which i am running before my render code.

Trouble is that when the matrix code is in, the OP poly list doesnt display.
This would sggest somenow that the registers i am using for the matrices are interfering with the prefecth or the SQ pointer.

i should maybe use mat transform? or something
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: _asm Anomaly

Post by bogglez »

Post code please. And the matrix functions in kos use assembly, so you shouldn't notice any performance difference.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: _asm Anomaly

Post by BlueCrab »

bogglez wrote:Post code please. And the matrix functions in kos use assembly, so you shouldn't notice any performance difference.
Not only that, but they're quite optimized, which is difficult to do correctly in many cases. :wink:
User avatar
AtariOwl
DCEmu Freak
DCEmu Freak
Posts: 96
Joined: Fri May 23, 2008 5:57 am
Has thanked: 0
Been thanked: 2 times

Re: _asm Anomaly

Post by AtariOwl »

I'm going to use the KOS matrix functions i think

The code i had was asm and i think it was interefering with variables. i should have looked into the stack.

Code: Select all

	fmov.s @r4+,fr0
	fmov.s @r4+,fr1
	fmov.s @r4+,fr2
	fmov.s @r4+,fr3
	ftrv xmtrx,fv0
	fmov.s @r4+,fr4
	fmov.s @r4+,fr5
	fmov.s @r4+,fr6
	fmov.s @r4+,fr7
	ftrv xmtrx,fv4
	fmov.s @r4+,fr8
	fmov.s @r4+,fr9
	fmov.s @r4+,fr10
	fmov.s @r4+,fr11
	ftrv xmtrx,fv8
	fmov.s @r4+,fr12
	fmov.s @r4+,fr13
	fmov.s @r4+,fr14
	fmov.s @r4+,fr15
	ftrv xmtrx,fv12
	fschg
	fmov dr0,xd0
	fmov dr2,xd2
	fmov dr4,xd4
	fmov dr6,xd6
	fmov dr8,xd8
	fmov dr10,xd10
	fmov dr12,xd12
	fmov dr14,xd14
	rts
	fschg
On the jag of course i'd have pushed all the regs onto the stack before i called it and popped them back off at the end.
It actually looks like i'm not messing up the render but the maths... i suspect its because i wasnt preserving the registers when i called my asm

stack is r15?... i'll try it when i have time

or re write it.. decrement is faster than increment no?
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: _asm Anomaly

Post by BlueCrab »

If you're doing things in inline assembly, then, yes you'll have to push every register you use onto the stack and pop them back off at the end (or define them as clobbered, which will actually probably be better, since GCC will be able to either avoid using them or preserve them in a better manner than just pushing them all at once, probably).

If you're doing things as an assembly function (i.e, not just using an asm("nop\n\tnop\n\tnop"); type statement in a C file), then you only have to worry about saving the call-preserved floating point registers. Unfortunately, I don't recall what those are off the top of my head, as I rarely write code that deals with floating point stuff on the Dreamcast (and never in assembly).

Pushing things onto the stack is done like this:

Code: Select all

fmov fr12,@-r15
And popping things is done like this:

Code: Select all

fmov @r15+,fr12
That said, as has already been mentioned, you're not likely to do better than the code that's already in KOS. It's been hand-optimized and verified by a bunch of people over the years. :wink:
User avatar
AtariOwl
DCEmu Freak
DCEmu Freak
Posts: 96
Joined: Fri May 23, 2008 5:57 am
Has thanked: 0
Been thanked: 2 times

Re: _asm Anomaly

Post by AtariOwl »

ATM i pushed and popped and its working fine now.
Its from a .s file so its probably only certain regs.

I'll try it with KOS too.

The speed... isnt bad atm
we'll see how much it comes down
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: _asm Anomaly

Post by bogglez »

@AtariOwl: As far as I remember the floating point registers aren't preserved at all so you have to push and pop all registers that you use, even if you use asm in a C function (gcc will preserve r0-r7 but not any fr).
If you want to learn about assembly programming I think this is a nice project, but in terms of porting your game it's useless to implement this yourself, so I advise you to just use the KOS routines. Your performance bottleneck will be elsewhere.

@BlueCrab: I think the only possible optimization in those matrices is to add unrelated code during latency gaps that are pointed out in the comments. That's probably what ph3nom did in his renderer assembly code.
And I don't quite remember, but I think I remember the potential to add a function mat_mult_array(mat4, mat4*) that would multiply a whole array of matrices by a single matrix (for scene graphs for example). Maybe such a function already exists.. I don't remember.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: _asm Anomaly

Post by BlueCrab »

bogglez wrote:@AtariOwl: As far as I remember the floating point registers aren't preserved at all so you have to push and pop all registers that you use, even if you use asm in a C function (gcc will preserve r0-r7 but not any fr).
I just looked it up and fr12-fr15 are call-preserved (i.e, they'd have to be backed up in a separate assembly function). All the other floating point registers are call-clobbered with GCC's default setup for SuperH.

So, in summary: If in a separate .s file, you only have to back up fr12, fr13, fr14, and fr15. You're free to use any of the others without backing them up (KOS' matrix functions also show that exact same pattern). If in an asm() statement in a C file, then you need to either back everything up you use or list them as clobbered in the clobber list.
@BlueCrab: I think the only possible optimization in those matrices is to add unrelated code during latency gaps that are pointed out in the comments. That's probably what ph3nom did in his renderer assembly code.
Pretty much.
User avatar
AtariOwl
DCEmu Freak
DCEmu Freak
Posts: 96
Joined: Fri May 23, 2008 5:57 am
Has thanked: 0
Been thanked: 2 times

Re: _asm Anomaly

Post by AtariOwl »

OK

So i need to preserve my fregs and integer regs when i make each call?
I've preserved the fregs and it assembles and runs, but my angle calls are getting knackered up, so between push and pop stack routines, my variables are essentially useless.


Currently my clobber list as it were is all fregs, no regs.
Its my integer variables that are getting knackered so i better add some regs to my clobber list.
Wait is that even going to help?

if i'm calling several asm routines, and passing different variables o each, theni need to have a push/pop for each call?


Ugh frustrating as its messing up my variables i'm sending to the rotate calls.


Well i've switched to KOS calls for the moment anyway, its not exactly time critical at this point. But it would be nice to switch some time critical segments later, so i will try to figure it out.
Post Reply