Z80 emulation
-
- DC Developer
- Posts: 453
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Thu May 16, 2002 8:29 am
- Location: ice88's house
- Has thanked: 0
- Been thanked: 0
- Contact:
Z80 emulation
The problem with Genesis Plus at the moment is with the speed of the z80 emulation - C68K did a lot for the emulation in general, but the z80 is now holding things down.
Anyone got any experience / ideas on how to speed up the z80 emulation?
I have been fiddling with 32-byte aligning everything and making everything a 32 bit value - but to no avail...
Anyone got any experience / ideas on how to speed up the z80 emulation?
I have been fiddling with 32-byte aligning everything and making everything a 32 bit value - but to no avail...
Read my blog: http://unrational.blogspot.com
-
- Insane DCEmu
- Posts: 190
- Joined: Sun Jun 27, 2004 8:35 pm
- Location: stillwater, ok
- Has thanked: 0
- Been thanked: 0
- Contact:
- Quzar
- Dream Coder
- Posts: 7497
- Joined: Wed Jul 31, 2002 12:14 am
- Location: Miami, FL
- Has thanked: 4 times
- Been thanked: 9 times
- Contact:
that second thing you said... basically. Just like the M68k, we are the only people really being held back by this. On most any PC there are assembly emulators so there is no worry about speed, but we are stuck with a C emulator that just isnt built for speed. Frankly, what i would try to do with any emulator core would be dynarec, but that is just cause i dont understand emulation very well
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
-
- DCEmu Cool Poster
- Posts: 1048
- Joined: Thu May 16, 2002 5:01 pm
- Location: Madrid, Spain
- Has thanked: 0
- Been thanked: 0
Dynarec is only useful when the CPU clock of the system you want to emulate is high (compared to the CPU clock you're using to emulate that system). For example, to emulate a PSX 33MHz CPU in DC, dynarec is the only way if you want to get it fullspeed (along with SH4 ASM), but to emulate an Atari 2600, dynarec is not a good choice as dynarec tends to break compatibiliby, and maybe in some systems a dynarec emulator can be slower than a normal emulator for a really slow CPU.
-
- DC Developer
- Posts: 453
- Joined: Thu May 16, 2002 8:29 am
- Location: ice88's house
- Has thanked: 0
- Been thanked: 0
- Contact:
Hmmm - tried all of those - and makes no notable difference... it basically needs to be rewritten like C68K was - just I don't know how to start.BlackAura wrote:The Z80 emulator in GP has an accuracy switch. Disabling it disables support for undocumented flags and opcodes, and (supposedly) makes it a bit faster. I didn't notice any difference though.
Read my blog: http://unrational.blogspot.com
-
- DC Developer
- Posts: 9951
- Joined: Sun Dec 30, 2001 9:02 am
- Has thanked: 0
- Been thanked: 1 time
I didn't notice any either...and makes no notable difference
Yeah, it either needs to be rewritten from scratch in C, using all the insane optimizations that C68k uses (there's a reason nobody else had written a 68k emulator like that yet - requiring half a gig of RAM to compile it's a little much), or write on in SH-4 assembly. Emulators written in assembly tend to use the same kind of tricks that C68k uses (stuff like jump tables, not using real function calls, that kind of thing).it basically needs to be rewritten like C68K was - just I don't know how to start.
Ever written a CPU emulator before?
Most interpretative emulators follow this basic pattern:
Code: Select all
nCycles += cycles_to_run_this_time;
while(nCycles > 0)
{
/* Fetch */
instruction = read_from_memory(registers.instruction_pointer);
/* Decode */
switch(instruction)
{
case 0x??:
/* Execute */
do_instruction_??();
break;
default:
die_painfully();
break;
}
/* Consume time */
nCycles -= cycle_table[instruction];
}
There's a load of documentation about this sort of thing out there. I'll see if I can dig some up...
-
- Soul Sold for DCEmu
- Posts: 4865
- Joined: Fri Jul 11, 2003 9:56 pm
- Has thanked: 2 times
- Been thanked: 4 times
Not found any change ether with the z80 with any of the flags speed change is about %1 on some mame drivers using that define. Once the target system using the z80 is about 4 mhz it's slow for many years working on mame drivers ive found the z80 to be near the slowest cpu core. By it's self it's fine soon as you add one other core with it then you can never get it fast enff.
Real pain.
Real pain.
Dreamcast forever!!!
-
- Mental DCEmu
- Posts: 330
- Joined: Sun Mar 23, 2003 10:52 pm
- Has thanked: 0
- Been thanked: 0
Switch statements are rather inefficient for instruction decoding. Best way to do it is with an array of function pointers with the instruction being decoded as the index into the array.
Flag calculation is probably the other expensive part, ASM won't help you very much there since the SH-4 doesn't do much flag calculation itself. Your best bet is to have tables of precalculated flags for at least the more common operations. I'm pretty sure only the 8-bit instructions change flags so it's only 64KB of data for each table.
Flag calculation is probably the other expensive part, ASM won't help you very much there since the SH-4 doesn't do much flag calculation itself. Your best bet is to have tables of precalculated flags for at least the more common operations. I'm pretty sure only the 8-bit instructions change flags so it's only 64KB of data for each table.
-
- DC Developer
- Posts: 9951
- Joined: Sun Dec 30, 2001 9:02 am
- Has thanked: 0
- Been thanked: 1 time
There are quite a few others. It's just a matter of finding one...
Marat Fayzullin's Z80 emulator is not going to be any better than the MAME one. It's slower, and has a lot more bugs.
Marcel de Kogel's Z80em (which I think is based on Marat's one above) is faster and less buggy, but I don't have a clue how well it might work on a Dreamcast. Has optional x86 assembly, which is obviously useless to us.
DOZE and RAZE are both assembly-only, so they're no good to us either. Both are really good emulators though.
Neil Bradley's MZ80 is (I think) both C and assembly, depending on what command-line options you give the generator program.
Other than that, we have the MAME one, which we're already using.
So there are quite a lot of them, but most are x86 assembly only, or are no better than the one we're using.
Marat Fayzullin's Z80 emulator is not going to be any better than the MAME one. It's slower, and has a lot more bugs.
Marcel de Kogel's Z80em (which I think is based on Marat's one above) is faster and less buggy, but I don't have a clue how well it might work on a Dreamcast. Has optional x86 assembly, which is obviously useless to us.
DOZE and RAZE are both assembly-only, so they're no good to us either. Both are really good emulators though.
Neil Bradley's MZ80 is (I think) both C and assembly, depending on what command-line options you give the generator program.
Other than that, we have the MAME one, which we're already using.
So there are quite a lot of them, but most are x86 assembly only, or are no better than the one we're using.
Unless the compiler's smart enough to convert the switch statement into a jump table. There's really no way to guarentee that when you're writing the code though, and I don't think GCC is that smart.Switch statements are rather inefficient for instruction decoding. Best way to do it is with an array of function pointers with the instruction being decoded as the index into the array.
-
- Soul Sold for DCEmu
- Posts: 4865
- Joined: Fri Jul 11, 2003 9:56 pm
- Has thanked: 2 times
- Been thanked: 4 times
-
- Insane DCEmu
- Posts: 190
- Joined: Sun Jun 27, 2004 8:35 pm
- Location: stillwater, ok
- Has thanked: 0
- Been thanked: 0
- Contact:
i was going to ask, do case staments get optimized by the compiler or are function pointers a fairly good way to approach that?Mask of Destiny wrote:Switch statements are rather inefficient for instruction decoding. Best way to do it is with an array of function pointers with the instruction being decoded as the index into the array.
Flag calculation is probably the other expensive part, ASM won't help you very much there since the SH-4 doesn't do much flag calculation itself. Your best bet is to have tables of precalculated flags for at least the more common operations. I'm pretty sure only the 8-bit instructions change flags so it's only 64KB of data for each table.
i have never done emulation before have done my share fo software development (C + many other languages over the last 10-15 years), but i mainly do IT type work these days
-
- DC Developer
- Posts: 80
- Joined: Wed Oct 17, 2001 7:44 pm
- Location: Sweden
- Has thanked: 0
- Been thanked: 1 time
BlackAura wrote:Unless the compiler's smart enough to convert the switch statement into a jump table. There's really no way to guarentee that when you're writing the code though, and I don't think GCC is that smart.
Code: Select all
c++ test code:
--------------8<---------------
switch (argc) {
case 0: printf("0\n"); break;
case 1: printf("1\n"); break;
case 2: printf("2\n"); break;
case 3: printf("3\n"); break;
--------------8<---------------
assembly output:
--------------8<---------------
mova .L10,r0
shll2 r1
mov.l @(r0,r1),r1
braf r1
nop
.L11:
.align 2
.L10:
.long .L3-.L11
.long .L4-.L11
.long .L5-.L11
.long .L6-.L11
.long .L7-.L11
.long .L8-.L11
--------------8<---------------
Though, you should probably check the output for your own switch statements if it does it there too.
-
- Insane DCEmu
- Posts: 190
- Joined: Sun Jun 27, 2004 8:35 pm
- Location: stillwater, ok
- Has thanked: 0
- Been thanked: 0
- Contact:
well it wouldn't be THAT hard for the compiler to make a jump table and case values must be a constant of some sort, and therefore, the known values of the switch statement are there, so only a finite and set values of jump points would be available.
i know with function pointers, while cool to use and easy to call in a loop, and i guess each op instruction number can be an array entry. however, calling functions causes the stack to at least push and pop the processors program counter/instruction pointer (as well as other data at times) which could waste clock cycles.
unless you can make calls to functions in an array of function pointers to be inline calls
i know with function pointers, while cool to use and easy to call in a loop, and i guess each op instruction number can be an array entry. however, calling functions causes the stack to at least push and pop the processors program counter/instruction pointer (as well as other data at times) which could waste clock cycles.
unless you can make calls to functions in an array of function pointers to be inline calls
- Stef.D
- DCEmu Respected
- Posts: 114
- Joined: Wed Oct 15, 2003 1:46 am
- Has thanked: 0
- Been thanked: 0
- Contact:
Z80 is a 8 bits CPU so you can easily use a 256 switch statement for instruction decoding... almost time, C compilers are smart enough to build a 256 entries jump table...
The only reason why i used computed label in C68K is that almost C compiler doesn't support 65536 switch statement (i tried, and it failed with an overflow with almost compilers)... the fact than C68K takes so much memory to compile is just du to a GCC trick, i'm sure others compilers can compile it without much troubles as soon they support computed label (Visual C++ doesn't support it).
Using a function pointer table is actually really slower than having a simple jump table... it's the main difference between musashi and C68K... i'm also using a faster way for fetching instruction (reading data from PC), i can also do it for SP (A7) but speed improvement doesn't worth the effort imo...
Actually, Z80 C emulator are probably already optimised, i believe MZ80 is the fastest, i never tried it though.. I wrote a Z80 core sometime ago, but it was in pure x86 ASM :-/ at least i've some knowledges about it and undocumented opcodes so i can quickly rewrite a new one...
I'll have a look into the actuals C Z80 core, see if we can do really better but i'm not sure...
The only reason why i used computed label in C68K is that almost C compiler doesn't support 65536 switch statement (i tried, and it failed with an overflow with almost compilers)... the fact than C68K takes so much memory to compile is just du to a GCC trick, i'm sure others compilers can compile it without much troubles as soon they support computed label (Visual C++ doesn't support it).
Using a function pointer table is actually really slower than having a simple jump table... it's the main difference between musashi and C68K... i'm also using a faster way for fetching instruction (reading data from PC), i can also do it for SP (A7) but speed improvement doesn't worth the effort imo...
Actually, Z80 C emulator are probably already optimised, i believe MZ80 is the fastest, i never tried it though.. I wrote a Z80 core sometime ago, but it was in pure x86 ASM :-/ at least i've some knowledges about it and undocumented opcodes so i can quickly rewrite a new one...
I'll have a look into the actuals C Z80 core, see if we can do really better but i'm not sure...
-
- DC Developer
- Posts: 9951
- Joined: Sun Dec 30, 2001 9:02 am
- Has thanked: 0
- Been thanked: 1 time
I think it is, but only the assembly version. The C version's not very fast, and (apparently) doesn't work well on the DC anyway.i believe MZ80 is the fastest, i never tried it though
A 256-way switch statement would certainly be the best way to implement the Z80, but implementing the two-byte instructions would be a pain. As far as I can tell, the first byte usually acts as a modifier, but one of them changes the instruction set for the second byte... Messy. Doing it that way, it'd probably be simpler to implement it in assembly anyway.
- Stef.D
- DCEmu Respected
- Posts: 114
- Joined: Wed Oct 15, 2003 1:46 am
- Has thanked: 0
- Been thanked: 0
- Contact:
The 2 bytes (or more) instructions can cause some troubles with imbriqued switch statements ... because prefixes can appears as many time we want... even if there is no way of doing that !BlackAura wrote:I think it is, but only the assembly version. The C version's not very fast, and (apparently) doesn't work well on the DC anyway.i believe MZ80 is the fastest, i never tried it though
A 256-way switch statement would certainly be the best way to implement the Z80, but implementing the two-byte instructions would be a pain. As far as I can tell, the first byte usually acts as a modifier, but one of them changes the instruction set for the second byte... Messy. Doing it that way, it'd probably be simpler to implement it in assembly anyway.
I guess computed label will help here, to respect the same implementation i did in my ASM core
Last edited by Stef.D on Mon Jul 05, 2004 9:40 am, edited 2 times in total.