Z80 emulation

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
BlackAura
DC Developer
DC Developer
Posts: 9951
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

Unfortunatly nothing is connected to the Z80 IO ports on genesis :-/
Oh bum. That would have made things easier. I guess that's because the I/O ports are used for SMS compatability mode.
q_006
Mental DCEmu
Mental DCEmu
Posts: 415
Joined: Thu Oct 10, 2002 7:18 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by q_006 »

since when did the Genesis have backware compatibility with the Master System? and how come i've never seen a cartridge converter to play SMS games on the Genesis (if what i'm hearing is true)?

(sorry to be offtopic)
User avatar
az_bont
Administrator
Administrator
Posts: 13567
Joined: Sat Mar 09, 2002 8:35 am
Location: Swansea, Wales
Has thanked: 0
Been thanked: 0
Contact:

Post by az_bont »

As far as I know, it has always been backwards compatible :P. Well, most models of it are - apparently the Genesis 3 and Nomad consoles lack a vital piece of hardware.

There were a couple of adaptors made, some still available at stores like Lik-Sang. There was also a limited edition cartridge of Phantasy Star I for the Mega Drive (Genesis) in Japan, which was basically the Master System game in a Mega Drive cartridge that took advantage of the compatibility mode.

It would have made a lot of sense to allow MS carts to be plugged in from the beginning, but then Sega were never really any good at making intelligent decisions...
Sick of sub-par Dreamcast web browsers that fail to impress? Visit Psilocybin Dreams!
speud
DCEmu Uncool Newbie
DCEmu Uncool Newbie
Posts: 1459
Joined: Sat Dec 27, 2003 10:40 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by speud »

how come i've never seen a cartridge converter to play SMS games on the Genesis
http://cgi.ebay.co.uk/ws/eBayISAPI.dll? ... 94084&rd=1
http://blueswirl.fr.st - DC Online Tools and Downloads

thx to Wack0 for the avatar ;)
Heliophobe
Smeg Creator
Smeg Creator
Posts: 246
Joined: Thu Mar 14, 2002 2:40 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by Heliophobe »

q_006 wrote:since when did the Genesis have backware compatibility with the Master System? and how come i've never seen a cartridge converter to play SMS games on the Genesis (if what i'm hearing is true)?

(sorry to be offtopic)
An adaptor was sold as the 'Power Base Converter'

I'm not sure exactly why it was supported out of the box, but I think they were trying to avoid the perception that the Genesis was just an upgraded Master System instead of a whole new system --- the Japanese Mark III and Master System were cartridge-compatible with the previous SG-1000 system, which Sega may have thought hurt their sales as it might have been perceived that the Master System was more of a "SG-1000+" then a major improvement over the SG-1000.

By offering the Power Base converter they had it covered both ways -- Master System fans could play their old carts on the new system, but the public wouldn't think of it as a "Master System +" --- the Power Base converter was large and looked like it would contain a lot of circuitry, perhaps a whoel embedded SMS, but really it was just a simple pin converter.

(sorry to continue to be off topic)
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

Rev. Layle wrote:edit: just read your post again, i guess you would want to test for RAM and banked area first becuase they are going to be hit the most. and (againe dit) come to think of it: a few "if"s may do the trick better and faster than the stupid shifts, ands, and swtiches i used.

is there anything in the 0x50XX range of memory?
Here's how i see that :

Code: Select all

if (adr < 0x4000) return ram[adr & 0x1FFF];
if (adr >= 0x8000) return read_byte_68000(adr + bank);

// IO PORT
...
When you have 1, 2 or 3 tests, it's generally better to use "if" than "switch" (if switch build a small jump table internally)

As far i remember, there is nothing in 0x5xxx area...
Rev. Layle
Insane DCEmu
Insane DCEmu
Posts: 190
Joined: Sun Jun 27, 2004 8:35 pm
Location: stillwater, ok
Has thanked: 0
Been thanked: 0
Contact:

Post by Rev. Layle »

yeah just realized that after i wrote all that - lol
DcSteve
Modder Of Rage
Modder Of Rage
Posts: 805
Joined: Mon Mar 18, 2002 12:41 pm
Location: Midwest
Has thanked: 0
Been thanked: 0
Contact:

Post by DcSteve »

hey warmtoe- any breakthroughs or progress on the sound..
Check out the beats of rage community at http://borrevolution.vg-network.com/
speud
DCEmu Uncool Newbie
DCEmu Uncool Newbie
Posts: 1459
Joined: Sat Dec 27, 2003 10:40 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by speud »

you know he probably wont make any progress if he has to report you everything hes doing in real time :? why not waiting and being patient, i bet he will post any relevant news in time.
http://blueswirl.fr.st - DC Online Tools and Downloads

thx to Wack0 for the avatar ;)
DcSteve
Modder Of Rage
Modder Of Rage
Posts: 805
Joined: Mon Mar 18, 2002 12:41 pm
Location: Midwest
Has thanked: 0
Been thanked: 0
Contact:

Post by DcSteve »

well then i apologize- source releases and then compilations just flew by so frequently before and thats when everyone started to workout everything fast. It seems like this working together has narrowed unless all of them are still sending updated sources back and forth privately.
Check out the beats of rage community at http://borrevolution.vg-network.com/
speud
DCEmu Uncool Newbie
DCEmu Uncool Newbie
Posts: 1459
Joined: Sat Dec 27, 2003 10:40 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by speud »

or maybe the parts they are working on requires time, dont you think? anyways, i can understand your haste, this project is very exciting, but to me it would be better to let them work on it calmly and dont ask for news every 2 days. but if blackaura/warmtoe/stef dont find it annoying they are free to correct me.
http://blueswirl.fr.st - DC Online Tools and Downloads

thx to Wack0 for the avatar ;)
BlackAura
DC Developer
DC Developer
Posts: 9951
Joined: Sun Dec 30, 2001 9:02 am
Has thanked: 0
Been thanked: 1 time

Post by BlackAura »

or maybe the parts they are working on requires time, dont you think?
They certainly do. I don't really want to release what I have until it works much better than it does now.
unless all of them are still sending updated sources back and forth privately.
Mostly, yes. Or just working on whatever-it-is independently. I did have another couple of source releases, but they weren't worth compiling and releasing.
Warmtoe
DC Developer
DC Developer
Posts: 453
Joined: Thu May 16, 2002 8:29 am
Location: ice88's house
Has thanked: 0
Been thanked: 0
Contact:

Post by Warmtoe »

Stef.D wrote:
Warmtoe wrote:Stef,

I made an initial stab at doing z80 work with a jump table - it's working but I have only implemented a fraction of the opcodes at the moment .

How do you remove the call for each fetch though? I will carry on with my hack - I'm sure yours will be much better - but I don't see how you can eliminate the fetch. One thought I had is to use the much-maligned cache to speed things up - by pointing it at the location that represents the current PC for the z80 - will that help?

Anyway - any insight!
As i done in C68K : in a standard CPU emulator, you store the current PC of emulated cpu in a variable (register is better) we can just call "PC" :)
Then when you need to fetch the next opcode (and opcode parameter) you'll have to read data at this address.
Since we execute code from this area, we can see it as a large memory space : RAM / ROM , this is the "fetch area".

Imagine we define fetch area as follow (for a Z80 CPU) :
0x0000-0x7FFF = rom_data
0xE000-0xFFFF = ram_data
(code can't be executed from IO port, that doesn't make sense.)

Well, the trick is that PC is equal to (cpu PC + fetch base)
Imagine we the following instruction : bra 0x450

with a conventionnal CPU core we only need to do :
...
NewPC = FetchWord;
PC = NewPC;
...

but here we'll do :

...
NewPC = FetchWord;
PC = FetchBase[NewPC >> X] + NewPC;
...

where FetchBase is a (0x10000 >> X) sized table containing Fetch base area (X can be 4-12, depending what we need).

Then when you need to fetch data, you only have to do :
data = *(u8*)PC; for byte
data = *(u16*)PC; for word
....

That make stuff a bit more complexe, since PC doesn't contain the real PC value, but it's a lot faster for fetch then ;)

My english is really limited, i hope you can understand my explainations.

Edit : That trick about PC can also be done on SP (stack pointer) since we *normally* use it only to store datas in memory area (no ports), but imo it doesn't worth the effort.

OK - I want to have a play with this - can you explain a little further? I'm not sure I understand it - but I want to :?
User avatar
Quzar
Dream Coder
Dream Coder
Posts: 7498
Joined: Wed Jul 31, 2002 12:14 am
Location: Miami, FL
Has thanked: 4 times
Been thanked: 10 times
Contact:

Post by Quzar »

The PC which is the current CPU seems to hold the fetch address. So every time you do a fetch it only has to add a small amount to the PC value to result in the data you are looking for. At least that is what i gathered from his explaination above and converting NeoCD from Musashi to C68k.
"When you post fewer lines of text than your signature, consider not posting at all." - A Wise Man
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

Actually the trick is that normally in a CPU emulator, if the current PC = 0x200 then you PC variable will be 0x200.
Here, PC = (0x200 + memory address of fetch area.)

for instance from 0x0000 to 0x1FFF we have the rom (stored in rom_data) then PC = 0x200 + rom_data = &(rom_data[200])

then instead of doing ReadByte(PC) to fetch the next instruction, we can just do *PC which is a lot faster :)
User avatar
blargg
DCEmu Newbie
DCEmu Newbie
Posts: 2
Joined: Thu Jul 15, 2004 6:02 am
Has thanked: 0
Been thanked: 0
Contact:

CPU Emulator Optimization Techniques

Post by blargg »

I saw a post about optimizing a Z80 core. I'm working on one and if I did my timing correctly, it's fairly fast.

I have written an emulator for the GameBoy CPU (which is a subset of the Z80) as a part of a sound emulation library. I timed it on my 120 MHz PowerMac computer and it executes 3,472,608 instructions per second (assuming I didn't mis-time it). The core compiles to 3700 bytes of PowerPC code and 2200 bytes of data.

The most significant technique is aggressive sharing of common instruction behavior, which greatly reduces code size and thus cache impact. There are a couple of more optimizations I haven't applied yet; I detail a few of them on a page about a 6502 emulator I wrote: http://www.slack.net/~ant/nes-emu/6502.html

One optimization I haven't implemented yet is to defer status flag determination until the flag is actually needed (I did implement it in an 8085 emulator many years ago). For example in a full Z80 core I'd have a new 8-bit variable "parity" to which I'd assign the result of any instruction which modified the parity bit, and only do the actual parity determination if a branch on parity or push flags instruction were encountered. The half-carry is another example; in the 8085 emulator I had two variables holding the previous and new value of an instruction which set the half-carry, and calculated it only for PUSH FA, DAA, and the end of an emulation run.

Until I've released the GameBoy sound emulation library, here are the relevant bits which demonstrate some techniques:

Code: Select all

// all memory accesses go through a function pointer table
// all instruction accesses use a mapping table; no function call
typedef unsigned (*reader_t)( unsigned addr );
typedef void (*writer_t)( unsigned addr, unsigned value );

reader_t data_reader [256];
writer_t data_writer [256];
uint8_t* code_map [256];

#define READ( addr )            (data_reader [(addr) >> 8]( addr ))
#define WRITE( addr, value )    (data_writer [(addr) >> 8]( addr, value ))
#define READ_PROG( addr )       (code_map [(addr) >> 8] [addr & 255])

void z80_stop() {
    cycles_remain = 0;
}

void z80_emulate( registers_t& r )
{
    unsigned pc = r.pc;
    unsigned sp = r.sp;
    unsigned flags = r.flags;

    goto loop;
    
inc_pc_loop:
    pc++;
loop:
    
    int cyc = cycles_remain - cycles_per_instruction;
    cycles_remain = cyc;
    
    // in actual emulator these are efficiently read together as a word
    unsigned op = READ_PROG( pc );
    pc++;
    unsigned data = READ_PROG( pc ); // pre-fetch data
    
    if ( cyc <= 0 )
        goto stop;
    
    // 25% of the time is spent stalling in this switch dispatch
    // since the desintation address isn't known in advance for prefetch.
    switch ( op ) {

    // ...

    case 0x20: // JR NZ
        if ( flags & z_flag )
            goto inc_pc_loop;
        // fall through
        
    case 0x18: // JR
    jr_taken:
        pc += int8_t (data); // sign-extend
        goto inc_pc_loop;
    
    case 0x28: // JR Z
        if ( flags & z_flag )
            goto jr_taken;
        goto inc_pc_loop;
    
    case 0x30: // JR NC
        if ( !(flags & c_flag) )
            goto jr_taken;
        goto inc_pc_loop;
    
    case 0x38: // JR C
        if ( flags & c_flag )
            goto jr_taken;
        goto inc_pc_loop;
    
    case 0xE9: // JP_HL
        pc = rp.hl;
        goto loop;
    
    // ...
    
    case 0xBE: // CMP (HL)
        data = rp.hl;
        data = READ( data );
        goto cmp_comm;
    case 0xB8: // CMP B
    case 0xB9: // CMP C
    case 0xBA: // CMP D
    case 0xBB: // CMP E
    case 0xBC: // CMP H
    case 0xBD: // CMP L
        data = R8( op & 7 ); // indexes b, c, d, e, h, l, -, a
        goto cmp_comm;
    case 0xFE: // CMP IMM
        pc++;
    cmp_comm:
        op = rg.a;
        data = op - data;
    sub_set_flags:
        flags = ((op & 15) - (data & 15)) & h_flag;
        flags |= (data >> 4) & c_flag;
        flags |= n_flag;
        if ( data & 0xff )
            goto loop;
        flags |= z_flag;
        goto loop;

    case 0x96: // SUB (HL)
        data = rp.hl;
        data = READ( data );
        goto sub_comm;
    case 0x90: // SUB B
    case 0x91: // SUB C
    case 0x92: // SUB D
    case 0x93: // SUB E
    case 0x94: // SUB H
    case 0x95: // SUB L
    case 0x97: // SUB A
        data = R8( op & 7 );
        goto sub_comm;
    case 0xD6: // SUB IMM
        pc++;
    sub_comm:
        op = rg.a;
        data = op - data;
        rg.a = data;
        goto sub_set_flags; // share flag-setting code with CMP
    
    // ...
    }
    
stop:
    r.pc = pc;
    r.sp = sp;
    r.flags = flags;
    // ...
}
User avatar
Stef.D
DCEmu Respected
DCEmu Respected
Posts: 114
Joined: Wed Oct 15, 2003 1:46 am
Has thanked: 0
Been thanked: 0
Contact:

Post by Stef.D »

What about trying to bench your Z80 core against current one used in Genesis Plus to see how it perform better ?
I was thinking about code size reduction for my new Z80 core, i don't know how "code stream break" (goto) affect execution speed on dreamcast though... at least on X86, it's often more efficient to have a huge code with limited intructions execution...
Heliophobe
Smeg Creator
Smeg Creator
Posts: 246
Joined: Thu Mar 14, 2002 2:40 pm
Has thanked: 0
Been thanked: 0
Contact:

Post by Heliophobe »

Stef.D wrote:What about trying to bench your Z80 core against current one used in Genesis Plus to see how it perform better ?
I was thinking about code size reduction for my new Z80 core, i don't know how "code stream break" (goto) affect execution speed on dreamcast though... at least on X86, it's often more efficient to have a huge code with limited intructions execution...
Sayten's core was written with the opposite in mind - code cache misses have a lot more impact on the dreamcast than they do on a typical desktop computer, so the theory was to cut down on cache hits by sharing more code, even at the expense of executing more instructions (of course, there is likely a break-even point). For x86 based cores, the opposite approach is usually taken because PC's generally have larger instruction caches, a large secondary cache, and probably faster ram relative to the CPU speed (on this point I am not certain).

It is, however, difficult to say if it would matter, especially in the case of Genesis Plus where you're already emulating a 68000 and your code cache might be shot to hell anyway, especially if you are interleaving the emulation of the 68k, z80, and other components throughout the frame (scanline per scanline, perhaps) rather than running all the z80 for a single frame at once. You might be better off with a large z80 core that executes a smaller number of instructions if it looks like the cache hits are inevitable. It might even depend on the game --- some might use a more diverse set of instructions than others.
Alexvrb
DCEmu Ultra Poster
DCEmu Ultra Poster
Posts: 1754
Joined: Wed Jul 17, 2002 11:25 am
Has thanked: 0
Been thanked: 0

Post by Alexvrb »

Heliophobe wrote:and probably faster ram relative to the CPU speed (on this point I am not certain).
This is OT, but: I think the DC's main memory is actually more balanced than many X86 machines. Not every P4 running at ~2-3Ghz has Dual DDR 400, and even if they did, is that better than 100Mhz SDRAM on a 200Mhz SH4, when it comes to integer work? They have a large cache though, so it doesn't matter that much - although that's not the only reason they have a large cache... pipeline is LONG.
User avatar
blargg
DCEmu Newbie
DCEmu Newbie
Posts: 2
Joined: Thu Jul 15, 2004 6:02 am
Has thanked: 0
Been thanked: 0
Contact:

GameBoy Z80 CPU core emulator

Post by blargg »

The Gb_Snd_Emu GameBoy Z80 sound emulator (with Z80-subset core) is available at, if anyone would like to examine its performance:

http://www.slack.net/~ant/nes-emu/
Post Reply