Hardware Emulation - Caches

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 61 times
Contact:

Hardware Emulation - Caches

Post by GyroVorbis »

I have never actually written a hardware emulator in software before, but I am very interested in the art. I like looking into open-source emulators and seeing how they did certain things.

One of the things that still remains fuzzy in my mind from a design standpoint is how best to emulate a memory cache.

From a naive standpoint, it almost seems like you don't need to emulate it, because it is transparent to software (for the most part) and the target machine is obviously caching RAM accesses anyway... But there are plenty of cache-specific instructions on certain architectures that could potentially be problematic depending on their use in software...

Adding an extra buffer to the emulator to serve as a "cache" seems like a very poor solution from an efficiency standpoint, as (obviously) you're just adding another layer of memory to be accessed and maintained.

Anyone with an expertise in the area care to enlighten me?
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5658
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Hardware Emulation - Caches

Post by BlueCrab »

Most emulators tend to not emulate the caches at all, simply because they're unnecessary overhead for most programs (at least that's been my experience).

Now, they can use the cache flushing instructions as advisory things to know when to, for instance, flush the dynarec cache. If the emulated system flushes the instruction cache, then its probably a good sign that you should kill off your recompiled blocks over that area, for instance.
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 61 times
Contact:

Re: Hardware Emulation - Caches

Post by GyroVorbis »

Thanks for that response. I couldn't actually pinpoint a scenario where a cache flush would require an emulator to do something, but a cache flush on the instruction cache would definitely require some action like you just mentioned.

Actually, Yabause is one of the emulators I have been studying. According to Wikipedia, it uses dynarec recompilation. Do you know where in the source code this translation is handled? I am interested in seeing the data structures/abstraction for doing this on different emulated hosts.

I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
TapamN
DC Developer
DC Developer
Posts: 105
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 90 times

Re: Hardware Emulation - Caches

Post by TapamN »

GyroVorbis wrote:I couldn't actually pinpoint a scenario where a cache flush would require an emulator to do something,
A cache flush is required for external hardware to receive the collected writes the CPU has made to a cache line. I have rendering code that uses cache flushes to send commands to the TA, instead of using the SQs or DMA. If the emulator doesn't emulate enough of the cache behavior, the emulated TA would get the wrong data or (more likely) just crash the emulator. Cache flush emulation could also affect what kind of data DMA would use, but it would be odd for this to be done on purpose...

I think a few Saturn games require some kind of cache emulation, since the Saturn emulator SSF seems to have some options to emulate the side effects of the cache. But since the emulator is closed source, one can't easily look and see exactly what its cache emulation does.
GyroVorbis wrote:I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
It could be of use, but it doesn't look like Yabause takes advantage of it. It seems to do dynamic recompilation like a PC-based emulator would, except it recompiles into SuperH instead of X86.

But should be possible for the SH-4 to directly execute SH-2 code. The SH-4 could use it's MMU to simulate the Saturn's memory layout, and run the Saturn code in user mode, which would allow the SH-4 to run most normal instructions directly (like addition and memory accesses) and generate exceptions on more dangerous instructions like messing with the vector base register or accessing Saturn hardware. This would cause a few invalid SH-2 instructions to be emulated with their new SH-4 instructions (e.g. the SH-2 doesn't have the SHAD/SHLD instructions that the SH-4 does), but that's unlikely to be much of a problem in reality. A bigger problem would be keeping the Saturn's two CPUs in sync...
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5658
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Hardware Emulation - Caches

Post by BlueCrab »

GyroVorbis wrote:Actually, Yabause is one of the emulators I have been studying. According to Wikipedia, it uses dynarec recompilation. Do you know where in the source code this translation is handled? I am interested in seeing the data structures/abstraction for doing this on different emulated hosts.
Yabause's dynarec is actually not used, pretty much at all. I don't think any of the versions use it by default, and I don't even know that it works in the current release. Unfortunately, the guy who wrote it disappeared for a while from the IRC channel before the release, and only reappeared afterwards. Yabause generally uses a completely interpreted core for the SH2.
I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
It is indeed "instruction compatible" with the SH2. However, this comes with some caveats. First of all, the DC's memory space is obviously a lot different than that of the Saturn. Some of this could be overcome with the MMU and remapping things, however that can't solve all of the problems due to the fact that the MMU in the SH4 can't map the entire memory space. Also, the SH2s in the Saturn are big endian, and of course the SH4 in the Dreamcast is little endian. Thus, you can't even expect to run the instructions directly even if you could get around all the other problems involved.

Oh, and you might find this little gem in the Yabause source code, if you really look. That said, before you ask, no it is not used in any released version of the emulator, and it isn't fully complete anyway. At some point, I probably should finish it. :P
TapamN wrote:But should be possible for the SH-4 to directly execute SH-2 code. The SH-4 could use it's MMU to simulate the Saturn's memory layout, and run the Saturn code in user mode, which would allow the SH-4 to run most normal instructions directly (like addition and memory accesses) and generate exceptions on more dangerous instructions like messing with the vector base register or accessing Saturn hardware. This would cause a few invalid SH-2 instructions to be emulated with their new SH-4 instructions (e.g. the SH-2 doesn't have the SHAD/SHLD instructions that the SH-4 does), but that's unlikely to be much of a problem in reality. A bigger problem would be keeping the Saturn's two CPUs in sync...
Unfortunately, as I outlined above, it isn't quite that easy since the SH4 MMU doesn't allow all areas of memory to be mapped. Plus, the difference in endianness also plays a part.

Oh, and keeping synchronization between the two CPUs isn't nearly as difficult as you make it out to be. Every emulator has to deal with synchronization between different CPUs and other such pieces. Even down to old consoles like the Sega Master System or the NES, you have to deal with synchronization between the CPU, video, and sound chips.

Also, most multi-CPU code in SMP systems tends to not be extremely timing dependent, at least in my experience. Sure, there will always be a couple of games that won't work perfectly without perfect, cycle-accurate synchronization, but that will affect every emulator out there at some point (unless, of course, you're actually doing cycle-accurate emulation).
Ayla
DC Developer
DC Developer
Posts: 142
Joined: Thu Apr 03, 2008 7:01 am
Has thanked: 0
Been thanked: 4 times
Contact:

Re: Hardware Emulation - Caches

Post by Ayla »

Can't the SH4 be configured for big endian?
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5658
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Hardware Emulation - Caches

Post by BlueCrab »

Ayla wrote:Can't the SH4 be configured for big endian?
By tying one of the pins of the CPU to +Vcc or to Ground, yes, it can.

It cannot be reconfigured in software.
The SH4 Programming Manual wrote:Big endian or little endian byte order can be selected for the data format. The endian should be set with the MD5 external pin in a power-on reset. Big endian is selected when the MD5 pin is low, and little endian when high. The endian cannot be changed dynamically. Bit positions are numbered left to right from most-significant to least-significant. Thus, in a 32-bit longword, the leftmost bit, bit 31, is the most significant bit and the rightmost bit, bit 0, is the least significant bit.
Ex-Cyber
DCEmu User with No Life
DCEmu User with No Life
Posts: 3641
Joined: Sat Feb 16, 2002 1:55 pm
Has thanked: 0
Been thanked: 0

Re: Hardware Emulation - Caches

Post by Ex-Cyber »

Caches are also part of that dark CpE realm that includes things like bus arbitration, pipeline hazards, and clock generation quirks that can screw up the emulation timing but are generally too obscure and/or expensive to actually emulate (though IIRC Tekken 2 on PSX will crash if a particular pipeline hazard doesn't behave like the real hardware; I don't know if emulators actually emulate it or just hack around it somehow).
"You know, I have a great, wonderful, really original method of teaching antitrust law, and it kept 80 percent of the students awake. They learned things. It was fabulous." -- Justice Stephen Breyer
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 61 times
Contact:

Re: Hardware Emulation - Caches

Post by GyroVorbis »

Thank you all for the informative responses. I learned quite a bit.
BlueCrab wrote:Oh, and you might find this little gem in the Yabause source code, if you really look. That said, before you ask, no it is not used in any released version of the emulator, and it isn't fully complete anyway. At some point, I probably should finish it. :P
OH MY GOD! I JUST GOT A DEVBONER! You should finish that!
TapamN
DC Developer
DC Developer
Posts: 105
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 90 times

Re: Hardware Emulation - Caches

Post by TapamN »

BlueCrab wrote:Some of this could be overcome with the MMU and remapping things, however that can't solve all of the problems due to the fact that the MMU in the SH4 can't map the entire memory space. Also, the SH2s in the Saturn are big endian, and of course the SH4 in the Dreamcast is little endian. Thus, you can't even expect to run the instructions directly even if you could get around all the other problems involved.
Oh, I didn't think about the different endianness. That would be rather large problem, wouldn't it? :P I guess you could still work around the instruction order mismatch by keeping Saturn memory byte swapped, but you wouldn't be able to let any memory accesses go through without trapping them, which would significantly decrease how useful running code directly would be.

I don't think the inability to map certain areas of RAM with the MMU would have been a big deal, since the areas you can't map to would be register accesses you would want to trap anyways.

As for processor synchronization, I was thinking of how the emulated timing would become unusually dependent on real system timing. An interpreter can calculate how long a given block of code is while it runs, but when trying to run the code directly the only estimation of how long the code took was by how much real time has passed. A register move and ALU operation could occur in one SH4 cycle, while a hardware register access could take 50+ cycles to emulate. So trying to run one CPU for X cycles, or figuring out how many cycles have elapsed since the start of one time slice, would be harder and less accurate.
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 61 times
Contact:

Re: Hardware Emulation - Caches

Post by GyroVorbis »

Sounds like you need two builds of the emulator... one for a normal DC, and an optimized version for a little-endian -> big-endian pin mod. :lol:
Post Reply