DWORDs Vs WORDS

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
Ian Micheal
Soul Sold for DCEmu
Soul Sold for DCEmu
Posts: 4865
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Fri Jul 11, 2003 9:56 pm
Has thanked: 2 times
Been thanked: 4 times

DWORDs Vs WORDS

Post by Ian Micheal »

Rand pointed out in neogeo cd im using words not Dwords and this would have a speed hit very slow etc.Can some one explain the changes between them and how would i go about converting from WORDS to DWORDs.


Ive been told WORDS on the dreamcast is slower. Im sorry i dont know much about this at all.

What i know is this

Bytes words and dwords are the basic chunks of data used in programming. The processor will work with the data size to suit the instruction it is executing.
A byte is 8 bits, a word is 16 bits (2 bytes) and a dword is 32 bits (4 bytes).

Cpu core is using wordsas far as i can tell.
Dreamcast forever!!!
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

Just a quick clarification on terminology: On Intel machines, a word is always 16 bits, and then you have strange constructs such as a dword (double word - 32 bits), qword (quad word - 64 bits), and so on. On every other machine ever, a word is the ammount of data the CPU can deal with in one go. On most modern machines (including the Dreamcast), a word is 32 bits. Some have 64-bit words (like AMD's Athlon 64 / Opteron, UltraSPARC, Alpha).

The Dreamcast's memory architecture can only access 32 bytes of memory at a time (the same as the size of one cache line). Every time you want to fetch something from memory, even a single byte, you have to grab 32 bytes. Same deal if you're writing to memory.

Inside the processor, you can generally only do anything with words (in this case, 32 bits, or an int datatype in C). The SH-4 has 32-bit registers, all it's operations work on 32 bits at a time, and it can only do anything to 32-bit registers, because it's a 32-bit machine.

I don't know the exact details of the SH-4s architecture, but when you write a single byte to memory, the CPU is probably going to have to:
1 - Fetch the appropriate word from memory
2 - Modify it to put the byte data in
3 - Write it back out to memory

The problem here is the fetch operation. If you're writing 2048 bytes to memory (for example), the CPU is going to have to do 2048 fetch operations and 2048 write operations. If you were writing that as 512 words (32-bit), you'd only be doing 512 write operations, and zero read operations.

That is completely ignoring the cache for the moment (if you write to memory that's not cached, the CPU will probably still have to fetch the appropriate cache line from memory, depending on the cache mode), but that's basically the problem.

Of course, if you're emulating something that requires 16-bit reads/writes, there's not a lot you can do about it. It depends on what exactly you're emulating.

The x86es are a little weird in that respect. As far as I know, they actually can operate on words (using Intel terminology now - 16-bits) as well as dwords and individual bytes.

(Wow... I'm actually using something I learned at university. First time that's ever happened)
User avatar
Quzar
Dream Coder
Dream Coder
Posts: 7499
Joined: Wed Jul 31, 2002 12:14 am
Location: Miami, FL
Has thanked: 4 times
Been thanked: 10 times
Contact:

Post by Quzar »

I understand what you are saying, but how would one use that properly? Just transfer the data in word sized chunks instead of bit by bit?

also, @ian as for the emulator core, you should make sure that its definition of word is what you are looking for. if it uses the proper word definition for dreamcast it would be 32bit but if it were using intel's it would be 16bit and would be faster using dwords (if its using intel definitions).

you would most likely have to rewrite enough of it to be a large amount of work :|
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
Ian Micheal
Soul Sold for DCEmu
Soul Sold for DCEmu
Posts: 4865
Joined: Fri Jul 11, 2003 9:56 pm
Has thanked: 2 times
Been thanked: 4 times

Post by Ian Micheal »

Yeah thats what it's looking like but i guess we want proper speed maybe it would not be worth it since c68 core is faster. Just intresting Rand said as a tip to look at it.
Dreamcast forever!!!
Rand Linden
bleemcast! Creator
bleemcast! Creator
Posts: 882
Joined: Wed Oct 17, 2001 7:44 pm
Location: Los Angeles, CA
Has thanked: 0
Been thanked: 0
Contact:

Post by Rand Linden »

Mostly, the tip was learning how to use the cache and SQs.

The DWORDs vs. WORDs came up with the mysterious (and yet unexplained) "dcache" function.

And, yes, in certain cases, 16bit will be slower than 32bit on DC.

Rand.
ssj4goku128
Insane DCEmu
Insane DCEmu
Posts: 290
Joined: Wed Oct 17, 2001 7:44 pm
Has thanked: 0
Been thanked: 0

Post by ssj4goku128 »

Well, it all depends if you are doing this in assembly or C. Im assuming the question is how can I go from a WORD to DWORD. Well, I believe that you should be able to just typecast from an integer to a long. In assembly, just use the following :

Code: Select all

toDbl:
mov.l sourceAddressofVariableContainingWord, RN !Replace N with 
!Number of Register
mov DestinationRegister, R0  !Not sure... could be wrong but I believe
!this might work[about to go to sleep hehehe]
Not sure about the assembly part, but the part about typecasting in C should be right.
"So I gotta be carefull, can't let tha evil of tha money trap me
so when ya see me #@#$%
ya better holla at me "

Tupac Shakur[1971-1996]
Makaveli[1996-????]
q_006
Mental DCEmu
Mental DCEmu
Posts: 415
Joined: Thu Oct 10, 2002 7:18 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by q_006 »

Rand Linden wrote:Mostly, the tip was learning how to use the cache and SQs.

The DWORDs vs. WORDs came up with the mysterious (and yet unexplained) "dcache" function.

And, yes, in certain cases, 16bit will be slower than 32bit on DC.

Rand.
well:
Rand Linden wrote: Using the cachable area of memory,
Writing all your data in whatever order you choose,
Flushing the portion of the cache that isn't already replaced.
then
BlackAura wrote: That's pretty much what it's doing. The dcache thingy is a KOS utility function which clears the data cache over a certain range
well BlackAura can probably show the function and a code snippet.... and probably explain it far better than i can quote it :D
Rand Linden
bleemcast! Creator
bleemcast! Creator
Posts: 882
Joined: Wed Oct 17, 2001 7:44 pm
Location: Los Angeles, CA
Has thanked: 0
Been thanked: 0
Contact:

Post by Rand Linden »

The explanation previously given isn't sufficiently detailed for me to determine whether or not it does what's claimed.

I'd strongly suspect that it ISN'T using WORDs, but hey, if no one else will bother to dig further, I certainly won't either.

Rand.
Sanchez
DCEmu Ex-Admin
DCEmu Ex-Admin
Posts: 1098
Joined: Wed Oct 17, 2001 7:44 pm
Has thanked: 0
Been thanked: 0

Post by Sanchez »

Rand Linden wrote:The explanation previously given isn't sufficiently detailed for me to determine whether or not it does what's claimed.

I'd strongly suspect that it ISN'T using WORDs, but hey, if no one else will bother to dig further, I certainly won't either.

Rand.
Well, here's the function headers I could find related to it:

Code: Select all

void dcache_flush_range(uint32 start, uint32 count);
and the assembly that looks to go with it....

Code: Select all

! This routine just goes through and forces a write-back on the
! specified data range. Use prior to dcache_inval_range if you
! care about the contents.
! r4 is starting address
! r5 is count
_dcache_flush_range:
	! Get ending address from count and align start address
	add	r4,r5
	mov.l	l1align,r0
	and	r0,r4

dflush_loop:
	! Write back the O cache
	ocbwb	@r4

	mov	#0x10,r0	! r4 | 0x1000
	shll8	r0
	or	r4,r0
	ocbwb	@r0
	
	mov	#0x20,r0	! r4 | 0x2000
	shll8	r0
	or	r4,r0
	ocbwb	@r0
	
	mov	#0x30,r0	! r4 | 0x3000
	shll8	r0
	or	r4,r0
	ocbwb	@r0
	
	cmp/hs	r4,r5
	bt/s	dflush_loop
	add	#32,r4		! += L1_CACHE_BYTES

	rts
	nop



	.align	2
l1align:
	.long	~31		! ~(L1_CACHE_BYTES-1)
	
BlackAura or another coder can probably comment a heck of a lot better than I...
"This is worse than when the Raccoon got in the copier!"
q_006
Mental DCEmu
Mental DCEmu
Posts: 415
Joined: Thu Oct 10, 2002 7:18 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by q_006 »

unit32 is an unsigned int 32 bit... so wouldn't that be a DWORD. well by Intel standards. and yes i know it's a sh-4 but we're all using wintel terminology.
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

I don't know much about data cache stuff, except maybe using dword datas provide better alignements.

But generally on SH-X CPUs, use DWORD (32 bits var) performs better than WORD (16 bits) or BYTE (8 bits) because anyway (almost time) datas are converted to DWORD (EXTxx instruction) before being computed... and that's definitly not free ;)
This is also true on a lot of others 32 bits RISC CPU...

Glad to see there are some cache management m?thod in KOS as dcache_flush_range(...), do someone know if a dcache_prefetch_range(...) method exist ? just to load up a certain part of memory in cache ?
nymus
DC Developer
DC Developer
Posts: 968
Joined: Tue Feb 11, 2003 4:12 pm
Location: In a Dream
Has thanked: 5 times
Been thanked: 6 times

Post by nymus »

the cpu does prefetching automatically so a prefetch function might mess up the cache or be too complex to mix with normal code. The reason why there's a flush function is that the sh4 does not automatically write back dirty data to memory like conventional caches.
behold the mind
inspired by Dreamcast
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

There is a prefetch instruction. The manual (for the SH-4) recommends that you use it if you're about to access a block of 32 bytes, and you're fairly certain that it isn't going to be in the cache already. If it's in the cache, it'll waste some time. If it's not in the cache, it'll fetch it immediately, so you won't stall the pipeline (probably not the correct terminology) later on.
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

The SH4 cache seems to be really weird :-/
Auto prefetch can't work in all case, at least at the first block access.
I guess the prefetch thing isn't available trough KOS...
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

The SH4 cache seems to be really weird :-/
Yep. It's mostly a standard direct-mapped cache, with 8KB of instruction cache, and 16KB of data cache.
I guess the prefetch thing isn't available trough KOS...
It is - one line of inline assembly. There should probably be a macro for it though.
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

BlackAura wrote:
I guess the prefetch thing isn't available trough KOS...
It is - one line of inline assembly. There should probably be a macro for it though.
It's already a good thing we have it :)
Post Reply