DWORDs Vs WORDS
-
- Soul Sold for DCEmu
- Posts: 4865
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Fri Jul 11, 2003 9:56 pm
- Has thanked: 2 times
- Been thanked: 4 times
DWORDs Vs WORDS
Rand pointed out in neogeo cd im using words not Dwords and this would have a speed hit very slow etc.Can some one explain the changes between them and how would i go about converting from WORDS to DWORDs.
Ive been told WORDS on the dreamcast is slower. Im sorry i dont know much about this at all.
What i know is this
Bytes words and dwords are the basic chunks of data used in programming. The processor will work with the data size to suit the instruction it is executing.
A byte is 8 bits, a word is 16 bits (2 bytes) and a dword is 32 bits (4 bytes).
Cpu core is using wordsas far as i can tell.
Ive been told WORDS on the dreamcast is slower. Im sorry i dont know much about this at all.
What i know is this
Bytes words and dwords are the basic chunks of data used in programming. The processor will work with the data size to suit the instruction it is executing.
A byte is 8 bits, a word is 16 bits (2 bytes) and a dword is 32 bits (4 bytes).
Cpu core is using wordsas far as i can tell.
Dreamcast forever!!!
-
- DC Developer
- Posts: 9951
- Joined: Sun Dec 30, 2001 9:02 am
- Has thanked: 0
- Been thanked: 1 time
Just a quick clarification on terminology: On Intel machines, a word is always 16 bits, and then you have strange constructs such as a dword (double word - 32 bits), qword (quad word - 64 bits), and so on. On every other machine ever, a word is the ammount of data the CPU can deal with in one go. On most modern machines (including the Dreamcast), a word is 32 bits. Some have 64-bit words (like AMD's Athlon 64 / Opteron, UltraSPARC, Alpha).
The Dreamcast's memory architecture can only access 32 bytes of memory at a time (the same as the size of one cache line). Every time you want to fetch something from memory, even a single byte, you have to grab 32 bytes. Same deal if you're writing to memory.
Inside the processor, you can generally only do anything with words (in this case, 32 bits, or an int datatype in C). The SH-4 has 32-bit registers, all it's operations work on 32 bits at a time, and it can only do anything to 32-bit registers, because it's a 32-bit machine.
I don't know the exact details of the SH-4s architecture, but when you write a single byte to memory, the CPU is probably going to have to:
1 - Fetch the appropriate word from memory
2 - Modify it to put the byte data in
3 - Write it back out to memory
The problem here is the fetch operation. If you're writing 2048 bytes to memory (for example), the CPU is going to have to do 2048 fetch operations and 2048 write operations. If you were writing that as 512 words (32-bit), you'd only be doing 512 write operations, and zero read operations.
That is completely ignoring the cache for the moment (if you write to memory that's not cached, the CPU will probably still have to fetch the appropriate cache line from memory, depending on the cache mode), but that's basically the problem.
Of course, if you're emulating something that requires 16-bit reads/writes, there's not a lot you can do about it. It depends on what exactly you're emulating.
The x86es are a little weird in that respect. As far as I know, they actually can operate on words (using Intel terminology now - 16-bits) as well as dwords and individual bytes.
(Wow... I'm actually using something I learned at university. First time that's ever happened)
The Dreamcast's memory architecture can only access 32 bytes of memory at a time (the same as the size of one cache line). Every time you want to fetch something from memory, even a single byte, you have to grab 32 bytes. Same deal if you're writing to memory.
Inside the processor, you can generally only do anything with words (in this case, 32 bits, or an int datatype in C). The SH-4 has 32-bit registers, all it's operations work on 32 bits at a time, and it can only do anything to 32-bit registers, because it's a 32-bit machine.
I don't know the exact details of the SH-4s architecture, but when you write a single byte to memory, the CPU is probably going to have to:
1 - Fetch the appropriate word from memory
2 - Modify it to put the byte data in
3 - Write it back out to memory
The problem here is the fetch operation. If you're writing 2048 bytes to memory (for example), the CPU is going to have to do 2048 fetch operations and 2048 write operations. If you were writing that as 512 words (32-bit), you'd only be doing 512 write operations, and zero read operations.
That is completely ignoring the cache for the moment (if you write to memory that's not cached, the CPU will probably still have to fetch the appropriate cache line from memory, depending on the cache mode), but that's basically the problem.
Of course, if you're emulating something that requires 16-bit reads/writes, there's not a lot you can do about it. It depends on what exactly you're emulating.
The x86es are a little weird in that respect. As far as I know, they actually can operate on words (using Intel terminology now - 16-bits) as well as dwords and individual bytes.
(Wow... I'm actually using something I learned at university. First time that's ever happened)
- Quzar
- Dream Coder
- Posts: 7499
- Joined: Wed Jul 31, 2002 12:14 am
- Location: Miami, FL
- Has thanked: 4 times
- Been thanked: 10 times
- Contact:
I understand what you are saying, but how would one use that properly? Just transfer the data in word sized chunks instead of bit by bit?
also, @ian as for the emulator core, you should make sure that its definition of word is what you are looking for. if it uses the proper word definition for dreamcast it would be 32bit but if it were using intel's it would be 16bit and would be faster using dwords (if its using intel definitions).
you would most likely have to rewrite enough of it to be a large amount of work
also, @ian as for the emulator core, you should make sure that its definition of word is what you are looking for. if it uses the proper word definition for dreamcast it would be 32bit but if it were using intel's it would be 16bit and would be faster using dwords (if its using intel definitions).
you would most likely have to rewrite enough of it to be a large amount of work
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
-
- Soul Sold for DCEmu
- Posts: 4865
- Joined: Fri Jul 11, 2003 9:56 pm
- Has thanked: 2 times
- Been thanked: 4 times
-
- bleemcast! Creator
- Posts: 882
- Joined: Wed Oct 17, 2001 7:44 pm
- Location: Los Angeles, CA
- Has thanked: 0
- Been thanked: 0
- Contact:
-
- Insane DCEmu
- Posts: 290
- Joined: Wed Oct 17, 2001 7:44 pm
- Has thanked: 0
- Been thanked: 0
Well, it all depends if you are doing this in assembly or C. Im assuming the question is how can I go from a WORD to DWORD. Well, I believe that you should be able to just typecast from an integer to a long. In assembly, just use the following :
Not sure about the assembly part, but the part about typecasting in C should be right.
Code: Select all
toDbl:
mov.l sourceAddressofVariableContainingWord, RN !Replace N with
!Number of Register
mov DestinationRegister, R0 !Not sure... could be wrong but I believe
!this might work[about to go to sleep hehehe]
"So I gotta be carefull, can't let tha evil of tha money trap me
so when ya see me #@#$%
ya better holla at me "
Tupac Shakur[1971-1996]
Makaveli[1996-????]
so when ya see me #@#$%
ya better holla at me "
Tupac Shakur[1971-1996]
Makaveli[1996-????]
-
- Mental DCEmu
- Posts: 415
- Joined: Thu Oct 10, 2002 7:18 pm
- Has thanked: 0
- Been thanked: 0
- Contact:
well:Rand Linden wrote:Mostly, the tip was learning how to use the cache and SQs.
The DWORDs vs. WORDs came up with the mysterious (and yet unexplained) "dcache" function.
And, yes, in certain cases, 16bit will be slower than 32bit on DC.
Rand.
thenRand Linden wrote: Using the cachable area of memory,
Writing all your data in whatever order you choose,
Flushing the portion of the cache that isn't already replaced.
well BlackAura can probably show the function and a code snippet.... and probably explain it far better than i can quote itBlackAura wrote: That's pretty much what it's doing. The dcache thingy is a KOS utility function which clears the data cache over a certain range
-
- bleemcast! Creator
- Posts: 882
- Joined: Wed Oct 17, 2001 7:44 pm
- Location: Los Angeles, CA
- Has thanked: 0
- Been thanked: 0
- Contact:
Well, here's the function headers I could find related to it:Rand Linden wrote:The explanation previously given isn't sufficiently detailed for me to determine whether or not it does what's claimed.
I'd strongly suspect that it ISN'T using WORDs, but hey, if no one else will bother to dig further, I certainly won't either.
Rand.
Code: Select all
void dcache_flush_range(uint32 start, uint32 count);
Code: Select all
! This routine just goes through and forces a write-back on the
! specified data range. Use prior to dcache_inval_range if you
! care about the contents.
! r4 is starting address
! r5 is count
_dcache_flush_range:
! Get ending address from count and align start address
add r4,r5
mov.l l1align,r0
and r0,r4
dflush_loop:
! Write back the O cache
ocbwb @r4
mov #0x10,r0 ! r4 | 0x1000
shll8 r0
or r4,r0
ocbwb @r0
mov #0x20,r0 ! r4 | 0x2000
shll8 r0
or r4,r0
ocbwb @r0
mov #0x30,r0 ! r4 | 0x3000
shll8 r0
or r4,r0
ocbwb @r0
cmp/hs r4,r5
bt/s dflush_loop
add #32,r4 ! += L1_CACHE_BYTES
rts
nop
.align 2
l1align:
.long ~31 ! ~(L1_CACHE_BYTES-1)
"This is worse than when the Raccoon got in the copier!"
- Stef.D
- DCEmu Respected
- Posts: 114
- Joined: Wed Oct 15, 2003 1:46 am
- Has thanked: 0
- Been thanked: 0
- Contact:
I don't know much about data cache stuff, except maybe using dword datas provide better alignements.
But generally on SH-X CPUs, use DWORD (32 bits var) performs better than WORD (16 bits) or BYTE (8 bits) because anyway (almost time) datas are converted to DWORD (EXTxx instruction) before being computed... and that's definitly not free
This is also true on a lot of others 32 bits RISC CPU...
Glad to see there are some cache management m?thod in KOS as dcache_flush_range(...), do someone know if a dcache_prefetch_range(...) method exist ? just to load up a certain part of memory in cache ?
But generally on SH-X CPUs, use DWORD (32 bits var) performs better than WORD (16 bits) or BYTE (8 bits) because anyway (almost time) datas are converted to DWORD (EXTxx instruction) before being computed... and that's definitly not free
This is also true on a lot of others 32 bits RISC CPU...
Glad to see there are some cache management m?thod in KOS as dcache_flush_range(...), do someone know if a dcache_prefetch_range(...) method exist ? just to load up a certain part of memory in cache ?
-
- DC Developer
- Posts: 968
- Joined: Tue Feb 11, 2003 4:12 pm
- Location: In a Dream
- Has thanked: 5 times
- Been thanked: 6 times
the cpu does prefetching automatically so a prefetch function might mess up the cache or be too complex to mix with normal code. The reason why there's a flush function is that the sh4 does not automatically write back dirty data to memory like conventional caches.
behold the mind
inspired by Dreamcast
inspired by Dreamcast
-
- DC Developer
- Posts: 9951
- Joined: Sun Dec 30, 2001 9:02 am
- Has thanked: 0
- Been thanked: 1 time
There is a prefetch instruction. The manual (for the SH-4) recommends that you use it if you're about to access a block of 32 bytes, and you're fairly certain that it isn't going to be in the cache already. If it's in the cache, it'll waste some time. If it's not in the cache, it'll fetch it immediately, so you won't stall the pipeline (probably not the correct terminology) later on.