Z80 emulation

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Warmtoe
DC Developer
DC Developer
Posts: 453
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Thu May 16, 2002 8:29 am
Location: ice88's house
Has thanked: 0
Been thanked: 0
Contact:

Z80 emulation

Post by Warmtoe »

The problem with Genesis Plus at the moment is with the speed of the z80 emulation - C68K did a lot for the emulation in general, but the z80 is now holding things down.

Anyone got any experience / ideas on how to speed up the z80 emulation?

I have been fiddling with 32-byte aligning everything and making everything a 32 bit value - but to no avail...
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

The Z80 emulator in GP has an accuracy switch. Disabling it disables support for undocumented flags and opcodes, and (supposedly) makes it a bit faster. I didn't notice any difference though.
Rev. Layle
Insane DCEmu
Insane DCEmu
Posts: 190
Joined: Sun Jun 27, 2004 8:35 pm
Location: stillwater, ok
Has thanked: 0
Been thanked: 0
Contact:

Post by Rev. Layle »

what makes the z80 so difficult to emulate efficiently, or has no one tackled it in depth long enough to make any leeway on it?
User avatar
Quzar
Dream Coder
Dream Coder
Posts: 7497
Joined: Wed Jul 31, 2002 12:14 am
Location: Miami, FL
Has thanked: 4 times
Been thanked: 9 times
Contact:

Post by Quzar »

that second thing you said... basically. Just like the M68k, we are the only people really being held back by this. On most any PC there are assembly emulators so there is no worry about speed, but we are stuck with a C emulator that just isnt built for speed. Frankly, what i would try to do with any emulator core would be dynarec, but that is just cause i dont understand emulation very well :?
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
doragasu
DCEmu Cool Poster
DCEmu Cool Poster
Posts: 1048
Joined: Thu May 16, 2002 5:01 pm
Location: Madrid, Spain
Has thanked: 0
Been thanked: 0

Post by doragasu »

Dynarec is only useful when the CPU clock of the system you want to emulate is high (compared to the CPU clock you're using to emulate that system). For example, to emulate a PSX 33MHz CPU in DC, dynarec is the only way if you want to get it fullspeed (along with SH4 ASM), but to emulate an Atari 2600, dynarec is not a good choice as dynarec tends to break compatibiliby, and maybe in some systems a dynarec emulator can be slower than a normal emulator for a really slow CPU.
Warmtoe
DC Developer
DC Developer
Posts: 453
Joined: Thu May 16, 2002 8:29 am
Location: ice88's house
Has thanked: 0
Been thanked: 0
Contact:

Post by Warmtoe »

BlackAura wrote:The Z80 emulator in GP has an accuracy switch. Disabling it disables support for undocumented flags and opcodes, and (supposedly) makes it a bit faster. I didn't notice any difference though.
Hmmm - tried all of those - and makes no notable difference... it basically needs to be rewritten like C68K was - just I don't know how to start.
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

and makes no notable difference
I didn't notice any either...
it basically needs to be rewritten like C68K was - just I don't know how to start.
Yeah, it either needs to be rewritten from scratch in C, using all the insane optimizations that C68k uses (there's a reason nobody else had written a 68k emulator like that yet - requiring half a gig of RAM to compile it's a little much), or write on in SH-4 assembly. Emulators written in assembly tend to use the same kind of tricks that C68k uses (stuff like jump tables, not using real function calls, that kind of thing).

Ever written a CPU emulator before?

Most interpretative emulators follow this basic pattern:

Code: Select all

nCycles += cycles_to_run_this_time;
while(nCycles > 0)
{
    /* Fetch */
    instruction = read_from_memory(registers.instruction_pointer);

    /* Decode */
    switch(instruction)
    {
        case 0x??:
            /* Execute */
            do_instruction_??();
            break;
        default:
            die_painfully();
            break;
    }

    /* Consume time */
    nCycles -= cycle_table[instruction];
}
Obviously that varies a lot, especially the decode phase, depending on the CPU you're emulating

There's a load of documentation about this sort of thing out there. I'll see if I can dig some up...
Ian Micheal
Soul Sold for DCEmu
Soul Sold for DCEmu
Posts: 4865
Joined: Fri Jul 11, 2003 9:56 pm
Has thanked: 2 times
Been thanked: 4 times

Post by Ian Micheal »

Not found any change ether with the z80 with any of the flags speed change is about %1 on some mame drivers using that define. Once the target system using the z80 is about 4 mhz it's slow for many years working on mame drivers ive found the z80 to be near the slowest cpu core. By it's self it's fine soon as you add one other core with it then you can never get it fast enff.

Real pain.
Dreamcast forever!!!
User avatar
Quzar
Dream Coder
Dream Coder
Posts: 7497
Joined: Wed Jul 31, 2002 12:14 am
Location: Miami, FL
Has thanked: 4 times
Been thanked: 9 times
Contact:

Post by Quzar »

Has anyone tried any other z80 emulators? or more like, do they exist? I know there are a few others but i mean any other portable ones (other than the mame cpu core)
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
Mask of Destiny
Mental DCEmu
Mental DCEmu
Posts: 330
Joined: Sun Mar 23, 2003 10:52 pm
Has thanked: 0
Been thanked: 0

Post by Mask of Destiny »

Switch statements are rather inefficient for instruction decoding. Best way to do it is with an array of function pointers with the instruction being decoded as the index into the array.

Flag calculation is probably the other expensive part, ASM won't help you very much there since the SH-4 doesn't do much flag calculation itself. Your best bet is to have tables of precalculated flags for at least the more common operations. I'm pretty sure only the 8-bit instructions change flags so it's only 64KB of data for each table.
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

There are quite a few others. It's just a matter of finding one...

Marat Fayzullin's Z80 emulator is not going to be any better than the MAME one. It's slower, and has a lot more bugs.

Marcel de Kogel's Z80em (which I think is based on Marat's one above) is faster and less buggy, but I don't have a clue how well it might work on a Dreamcast. Has optional x86 assembly, which is obviously useless to us.

DOZE and RAZE are both assembly-only, so they're no good to us either. Both are really good emulators though.

Neil Bradley's MZ80 is (I think) both C and assembly, depending on what command-line options you give the generator program.

Other than that, we have the MAME one, which we're already using.

So there are quite a lot of them, but most are x86 assembly only, or are no better than the one we're using.
Switch statements are rather inefficient for instruction decoding. Best way to do it is with an array of function pointers with the instruction being decoded as the index into the array.
Unless the compiler's smart enough to convert the switch statement into a jump table. There's really no way to guarentee that when you're writing the code though, and I don't think GCC is that smart.
Ian Micheal
Soul Sold for DCEmu
Soul Sold for DCEmu
Posts: 4865
Joined: Fri Jul 11, 2003 9:56 pm
Has thanked: 2 times
Been thanked: 4 times

Post by Ian Micheal »

MZ80 is whats used in genrator and neogeo cd. It crashes for me most times when running.
Dreamcast forever!!!
Rev. Layle
Insane DCEmu
Insane DCEmu
Posts: 190
Joined: Sun Jun 27, 2004 8:35 pm
Location: stillwater, ok
Has thanked: 0
Been thanked: 0
Contact:

Post by Rev. Layle »

Mask of Destiny wrote:Switch statements are rather inefficient for instruction decoding. Best way to do it is with an array of function pointers with the instruction being decoded as the index into the array.

Flag calculation is probably the other expensive part, ASM won't help you very much there since the SH-4 doesn't do much flag calculation itself. Your best bet is to have tables of precalculated flags for at least the more common operations. I'm pretty sure only the 8-bit instructions change flags so it's only 64KB of data for each table.
i was going to ask, do case staments get optimized by the compiler or are function pointers a fairly good way to approach that?

i have never done emulation before have done my share fo software development (C + many other languages over the last 10-15 years), but i mainly do IT type work these days
quarn
DC Developer
DC Developer
Posts: 80
Joined: Wed Oct 17, 2001 7:44 pm
Location: Sweden
Has thanked: 0
Been thanked: 1 time

Post by quarn »

BlackAura wrote:Unless the compiler's smart enough to convert the switch statement into a jump table. There's really no way to guarentee that when you're writing the code though, and I don't think GCC is that smart.

Code: Select all

c++ test code:
--------------8<---------------

switch (argc) {
                case 0: printf("0\n"); break;
                case 1: printf("1\n"); break;
                case 2: printf("2\n"); break;
                case 3: printf("3\n"); break;
--------------8<---------------

assembly output:
--------------8<---------------
        mova    .L10,r0
        shll2   r1
        mov.l   @(r0,r1),r1
        braf    r1
        nop
.L11:
        .align 2
.L10:
        .long   .L3-.L11
        .long   .L4-.L11
        .long   .L5-.L11
        .long   .L6-.L11
        .long   .L7-.L11
        .long   .L8-.L11
--------------8<---------------
It sure looks like a jump table to me.

Though, you should probably check the output for your own switch statements if it does it there too.
Rev. Layle
Insane DCEmu
Insane DCEmu
Posts: 190
Joined: Sun Jun 27, 2004 8:35 pm
Location: stillwater, ok
Has thanked: 0
Been thanked: 0
Contact:

Post by Rev. Layle »

well it wouldn't be THAT hard for the compiler to make a jump table and case values must be a constant of some sort, and therefore, the known values of the switch statement are there, so only a finite and set values of jump points would be available.

i know with function pointers, while cool to use and easy to call in a loop, and i guess each op instruction number can be an array entry. however, calling functions causes the stack to at least push and pop the processors program counter/instruction pointer (as well as other data at times) which could waste clock cycles.

unless you can make calls to functions in an array of function pointers to be inline calls
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

Z80 is a 8 bits CPU so you can easily use a 256 switch statement for instruction decoding... almost time, C compilers are smart enough to build a 256 entries jump table...
The only reason why i used computed label in C68K is that almost C compiler doesn't support 65536 switch statement (i tried, and it failed with an overflow with almost compilers)... the fact than C68K takes so much memory to compile is just du to a GCC trick, i'm sure others compilers can compile it without much troubles as soon they support computed label (Visual C++ doesn't support it).

Using a function pointer table is actually really slower than having a simple jump table... it's the main difference between musashi and C68K... i'm also using a faster way for fetching instruction (reading data from PC), i can also do it for SP (A7) but speed improvement doesn't worth the effort imo...

Actually, Z80 C emulator are probably already optimised, i believe MZ80 is the fastest, i never tried it though.. I wrote a Z80 core sometime ago, but it was in pure x86 ASM :-/ at least i've some knowledges about it and undocumented opcodes so i can quickly rewrite a new one...
I'll have a look into the actuals C Z80 core, see if we can do really better but i'm not sure...
User avatar
Quzar
Dream Coder
Dream Coder
Posts: 7497
Joined: Wed Jul 31, 2002 12:14 am
Location: Miami, FL
Has thanked: 4 times
Been thanked: 9 times
Contact:

Post by Quzar »

wow. that would be nice. a new z80 core to compliment the new 68k core.
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

i believe MZ80 is the fastest, i never tried it though
I think it is, but only the assembly version. The C version's not very fast, and (apparently) doesn't work well on the DC anyway.

A 256-way switch statement would certainly be the best way to implement the Z80, but implementing the two-byte instructions would be a pain. As far as I can tell, the first byte usually acts as a modifier, but one of them changes the instruction set for the second byte... Messy. Doing it that way, it'd probably be simpler to implement it in assembly anyway.
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

BlackAura wrote:
i believe MZ80 is the fastest, i never tried it though
I think it is, but only the assembly version. The C version's not very fast, and (apparently) doesn't work well on the DC anyway.

A 256-way switch statement would certainly be the best way to implement the Z80, but implementing the two-byte instructions would be a pain. As far as I can tell, the first byte usually acts as a modifier, but one of them changes the instruction set for the second byte... Messy. Doing it that way, it'd probably be simpler to implement it in assembly anyway.
The 2 bytes (or more) instructions can cause some troubles with imbriqued switch statements ... because prefixes can appears as many time we want... even if there is no way of doing that !
I guess computed label will help here, to respect the same implementation i did in my ASM core :)
Last edited by Stef.D on Mon Jul 05, 2004 9:40 am, edited 2 times in total.
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

....
Post Reply