Hardware Emulation - Caches
- GyroVorbis
- Elysian Shadows Developer
- Posts: 1874
- https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
- Joined: Mon Mar 22, 2004 4:55 pm
- Location: #%^&*!!!11one Super Sonic
- Has thanked: 80 times
- Been thanked: 62 times
- Contact:
Hardware Emulation - Caches
I have never actually written a hardware emulator in software before, but I am very interested in the art. I like looking into open-source emulators and seeing how they did certain things.
One of the things that still remains fuzzy in my mind from a design standpoint is how best to emulate a memory cache.
From a naive standpoint, it almost seems like you don't need to emulate it, because it is transparent to software (for the most part) and the target machine is obviously caching RAM accesses anyway... But there are plenty of cache-specific instructions on certain architectures that could potentially be problematic depending on their use in software...
Adding an extra buffer to the emulator to serve as a "cache" seems like a very poor solution from an efficiency standpoint, as (obviously) you're just adding another layer of memory to be accessed and maintained.
Anyone with an expertise in the area care to enlighten me?
One of the things that still remains fuzzy in my mind from a design standpoint is how best to emulate a memory cache.
From a naive standpoint, it almost seems like you don't need to emulate it, because it is transparent to software (for the most part) and the target machine is obviously caching RAM accesses anyway... But there are plenty of cache-specific instructions on certain architectures that could potentially be problematic depending on their use in software...
Adding an extra buffer to the emulator to serve as a "cache" seems like a very poor solution from an efficiency standpoint, as (obviously) you're just adding another layer of memory to be accessed and maintained.
Anyone with an expertise in the area care to enlighten me?
- BlueCrab
- The Crabby Overlord
- Posts: 5663
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: Hardware Emulation - Caches
Most emulators tend to not emulate the caches at all, simply because they're unnecessary overhead for most programs (at least that's been my experience).
Now, they can use the cache flushing instructions as advisory things to know when to, for instance, flush the dynarec cache. If the emulated system flushes the instruction cache, then its probably a good sign that you should kill off your recompiled blocks over that area, for instance.
Now, they can use the cache flushing instructions as advisory things to know when to, for instance, flush the dynarec cache. If the emulated system flushes the instruction cache, then its probably a good sign that you should kill off your recompiled blocks over that area, for instance.
- GyroVorbis
- Elysian Shadows Developer
- Posts: 1874
- Joined: Mon Mar 22, 2004 4:55 pm
- Location: #%^&*!!!11one Super Sonic
- Has thanked: 80 times
- Been thanked: 62 times
- Contact:
Re: Hardware Emulation - Caches
Thanks for that response. I couldn't actually pinpoint a scenario where a cache flush would require an emulator to do something, but a cache flush on the instruction cache would definitely require some action like you just mentioned.
Actually, Yabause is one of the emulators I have been studying. According to Wikipedia, it uses dynarec recompilation. Do you know where in the source code this translation is handled? I am interested in seeing the data structures/abstraction for doing this on different emulated hosts.
I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
Actually, Yabause is one of the emulators I have been studying. According to Wikipedia, it uses dynarec recompilation. Do you know where in the source code this translation is handled? I am interested in seeing the data structures/abstraction for doing this on different emulated hosts.
I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
-
- DC Developer
- Posts: 108
- Joined: Sun Oct 04, 2009 11:13 am
- Has thanked: 2 times
- Been thanked: 90 times
Re: Hardware Emulation - Caches
A cache flush is required for external hardware to receive the collected writes the CPU has made to a cache line. I have rendering code that uses cache flushes to send commands to the TA, instead of using the SQs or DMA. If the emulator doesn't emulate enough of the cache behavior, the emulated TA would get the wrong data or (more likely) just crash the emulator. Cache flush emulation could also affect what kind of data DMA would use, but it would be odd for this to be done on purpose...GyroVorbis wrote:I couldn't actually pinpoint a scenario where a cache flush would require an emulator to do something,
I think a few Saturn games require some kind of cache emulation, since the Saturn emulator SSF seems to have some options to emulate the side effects of the cache. But since the emulator is closed source, one can't easily look and see exactly what its cache emulation does.
It could be of use, but it doesn't look like Yabause takes advantage of it. It seems to do dynamic recompilation like a PC-based emulator would, except it recompiles into SuperH instead of X86.GyroVorbis wrote:I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
But should be possible for the SH-4 to directly execute SH-2 code. The SH-4 could use it's MMU to simulate the Saturn's memory layout, and run the Saturn code in user mode, which would allow the SH-4 to run most normal instructions directly (like addition and memory accesses) and generate exceptions on more dangerous instructions like messing with the vector base register or accessing Saturn hardware. This would cause a few invalid SH-2 instructions to be emulated with their new SH-4 instructions (e.g. the SH-2 doesn't have the SHAD/SHLD instructions that the SH-4 does), but that's unlikely to be much of a problem in reality. A bigger problem would be keeping the Saturn's two CPUs in sync...
- BlueCrab
- The Crabby Overlord
- Posts: 5663
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: Hardware Emulation - Caches
Yabause's dynarec is actually not used, pretty much at all. I don't think any of the versions use it by default, and I don't even know that it works in the current release. Unfortunately, the guy who wrote it disappeared for a while from the IRC channel before the release, and only reappeared afterwards. Yabause generally uses a completely interpreted core for the SH2.GyroVorbis wrote:Actually, Yabause is one of the emulators I have been studying. According to Wikipedia, it uses dynarec recompilation. Do you know where in the source code this translation is handled? I am interested in seeing the data structures/abstraction for doing this on different emulated hosts.
It is indeed "instruction compatible" with the SH2. However, this comes with some caveats. First of all, the DC's memory space is obviously a lot different than that of the Saturn. Some of this could be overcome with the MMU and remapping things, however that can't solve all of the problems due to the fact that the MMU in the SH4 can't map the entire memory space. Also, the SH2s in the Saturn are big endian, and of course the SH4 in the Dreamcast is little endian. Thus, you can't even expect to run the instructions directly even if you could get around all the other problems involved.I know the SH4 datasheet says it is "instruction compatible" with the SH2. Was this of any use in the DC version of Yabause?
Oh, and you might find this little gem in the Yabause source code, if you really look. That said, before you ask, no it is not used in any released version of the emulator, and it isn't fully complete anyway. At some point, I probably should finish it.
Unfortunately, as I outlined above, it isn't quite that easy since the SH4 MMU doesn't allow all areas of memory to be mapped. Plus, the difference in endianness also plays a part.TapamN wrote:But should be possible for the SH-4 to directly execute SH-2 code. The SH-4 could use it's MMU to simulate the Saturn's memory layout, and run the Saturn code in user mode, which would allow the SH-4 to run most normal instructions directly (like addition and memory accesses) and generate exceptions on more dangerous instructions like messing with the vector base register or accessing Saturn hardware. This would cause a few invalid SH-2 instructions to be emulated with their new SH-4 instructions (e.g. the SH-2 doesn't have the SHAD/SHLD instructions that the SH-4 does), but that's unlikely to be much of a problem in reality. A bigger problem would be keeping the Saturn's two CPUs in sync...
Oh, and keeping synchronization between the two CPUs isn't nearly as difficult as you make it out to be. Every emulator has to deal with synchronization between different CPUs and other such pieces. Even down to old consoles like the Sega Master System or the NES, you have to deal with synchronization between the CPU, video, and sound chips.
Also, most multi-CPU code in SMP systems tends to not be extremely timing dependent, at least in my experience. Sure, there will always be a couple of games that won't work perfectly without perfect, cycle-accurate synchronization, but that will affect every emulator out there at some point (unless, of course, you're actually doing cycle-accurate emulation).
- BlueCrab
- The Crabby Overlord
- Posts: 5663
- Joined: Mon May 27, 2002 11:31 am
- Location: Sailing the Skies of Arcadia
- Has thanked: 9 times
- Been thanked: 69 times
- Contact:
Re: Hardware Emulation - Caches
By tying one of the pins of the CPU to +Vcc or to Ground, yes, it can.Ayla wrote:Can't the SH4 be configured for big endian?
It cannot be reconfigured in software.
The SH4 Programming Manual wrote:Big endian or little endian byte order can be selected for the data format. The endian should be set with the MD5 external pin in a power-on reset. Big endian is selected when the MD5 pin is low, and little endian when high. The endian cannot be changed dynamically. Bit positions are numbered left to right from most-significant to least-significant. Thus, in a 32-bit longword, the leftmost bit, bit 31, is the most significant bit and the rightmost bit, bit 0, is the least significant bit.
-
- DCEmu User with No Life
- Posts: 3641
- Joined: Sat Feb 16, 2002 1:55 pm
- Has thanked: 0
- Been thanked: 0
Re: Hardware Emulation - Caches
Caches are also part of that dark CpE realm that includes things like bus arbitration, pipeline hazards, and clock generation quirks that can screw up the emulation timing but are generally too obscure and/or expensive to actually emulate (though IIRC Tekken 2 on PSX will crash if a particular pipeline hazard doesn't behave like the real hardware; I don't know if emulators actually emulate it or just hack around it somehow).
"You know, I have a great, wonderful, really original method of teaching antitrust law, and it kept 80 percent of the students awake. They learned things. It was fabulous." -- Justice Stephen Breyer
- GyroVorbis
- Elysian Shadows Developer
- Posts: 1874
- Joined: Mon Mar 22, 2004 4:55 pm
- Location: #%^&*!!!11one Super Sonic
- Has thanked: 80 times
- Been thanked: 62 times
- Contact:
Re: Hardware Emulation - Caches
Thank you all for the informative responses. I learned quite a bit.
OH MY GOD! I JUST GOT A DEVBONER! You should finish that!BlueCrab wrote:Oh, and you might find this little gem in the Yabause source code, if you really look. That said, before you ask, no it is not used in any released version of the emulator, and it isn't fully complete anyway. At some point, I probably should finish it.
-
- DC Developer
- Posts: 108
- Joined: Sun Oct 04, 2009 11:13 am
- Has thanked: 2 times
- Been thanked: 90 times
Re: Hardware Emulation - Caches
Oh, I didn't think about the different endianness. That would be rather large problem, wouldn't it? I guess you could still work around the instruction order mismatch by keeping Saturn memory byte swapped, but you wouldn't be able to let any memory accesses go through without trapping them, which would significantly decrease how useful running code directly would be.BlueCrab wrote:Some of this could be overcome with the MMU and remapping things, however that can't solve all of the problems due to the fact that the MMU in the SH4 can't map the entire memory space. Also, the SH2s in the Saturn are big endian, and of course the SH4 in the Dreamcast is little endian. Thus, you can't even expect to run the instructions directly even if you could get around all the other problems involved.
I don't think the inability to map certain areas of RAM with the MMU would have been a big deal, since the areas you can't map to would be register accesses you would want to trap anyways.
As for processor synchronization, I was thinking of how the emulated timing would become unusually dependent on real system timing. An interpreter can calculate how long a given block of code is while it runs, but when trying to run the code directly the only estimation of how long the code took was by how much real time has passed. A register move and ALU operation could occur in one SH4 cycle, while a hardware register access could take 50+ cycles to emulate. So trying to run one CPU for X cycles, or figuring out how many cycles have elapsed since the start of one time slice, would be harder and less accurate.
- GyroVorbis
- Elysian Shadows Developer
- Posts: 1874
- Joined: Mon Mar 22, 2004 4:55 pm
- Location: #%^&*!!!11one Super Sonic
- Has thanked: 80 times
- Been thanked: 62 times
- Contact:
Re: Hardware Emulation - Caches
Sounds like you need two builds of the emulator... one for a normal DC, and an optimized version for a little-endian -> big-endian pin mod.