Looking for DC optimization experts to help on emu port

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Looking for DC optimization experts to help on emu port

Post by miker00lz »

I'm working on porting an x86 PC emu I wrote for Win32/Linux/OSX to the Dreamcast. I've got basics working like video output and controller input, but I've never programmed for the DC before. I'm fumbling my way through with the poor KOS documentation, and lots of trial and error. If anybody is interested in working on a project like this, it could really benefit from some veteran DC coder skills. I think it would be awesome to have a good PC emu for the DC. I tried DOSBOXDC, but it doesn't work very well and it seems to be very unfinished. An on-screen virtual keyboard would be nice (with the ability to custom map PC keys to the DC controller), among other things. :)

Here is video of what I've got ported so far playing Prince of Persia, just to show how far along it is.

https://www.youtube.com/watch?v=rUvbQKnvpxw

Plus a screenshot of using it to play Ultima 6:

Image

I'm trying to port my old sound code from SDL to KOS right now. The emu can emulate the Adlib, Sound Blaster, PC speaker, and Disney Sound Source. The normal PC version has good compatibility, and runs most real mode DOS games, and even runs Windows 3.0. If anybody with a lot of DC programming experience is interested in joining, let me know. I think this could end up being a good choice for PC stuff on the DC, but the DC processor is so slow it will need to be as optimized as possible. The main area I think I could use help with is the PowerVR video hardware, and using it to handle as much of the rendering/stretching load as possible to free up the SH4 for the x86 emu code. That is the biggest speed killer right now.
User avatar
SiZiOUS
DC Developer
DC Developer
Posts: 404
Joined: Fri Mar 05, 2004 2:22 pm
Location: France
Has thanked: 27 times
Been thanked: 19 times
Contact:

Re: Looking for DC optimization experts to help on emu port

Post by SiZiOUS »

Hello miker00lz,

Unfortunately, I don't have the required skills to answer you, I'm sorry, but:
There's a lot of talented people here, try to contact for example JMD, he's the author of the DreamCPC emu port (with a on-screen keyboard).

Just a side question, do you develop using nullDC or in the real hardware?
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

Thanks for the reply. I just found a DreamCPC screenshot, and that on-screen keyboard looks perfect! :)

I use both nullDC and real hardware to develop. I don't have a coder cable unfortunately, so I mostly use the emu for testing but burn it to a CD-R and run it on my DC just to be safe when I make any major changes. I'm going to try to make a cable soon, that would be much nicer.
User avatar
SiZiOUS
DC Developer
DC Developer
Posts: 404
Joined: Fri Mar 05, 2004 2:22 pm
Location: France
Has thanked: 27 times
Been thanked: 19 times
Contact:

Re: Looking for DC optimization experts to help on emu port

Post by SiZiOUS »

I have a Coders Cable and a BBA, so if you need some alpha/beta test, feel free to PM me! :)

Keep us informed about your project :) !
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Looking for DC optimization experts to help on emu port

Post by PH3NOM »

miker00lz wrote:I'm trying to port my old sound code from SDL to KOS right now. The emu can emulate the Adlib, Sound Blaster, PC speaker, and Disney Sound Source. The normal PC version has good compatibility, and runs most real mode DOS games, and even runs Windows 3.0. If anybody with a lot of DC programming experience is interested in joining, let me know. I think this could end up being a good choice for PC stuff on the DC, but the DC processor is so slow it will need to be as optimized as possible. The main area I think I could use help with is the PowerVR video hardware, and using it to handle as much of the rendering/stretching load as possible to free up the SH4 for the x86 emu code. That is the biggest speed killer right now.
Good call not to use SDL_Mixer on DC for audio output, it works but is stupid slow compared to interfacing the AICA using the KOS library.
Can you tell me how the sound is generated by the emulator? How many chanels are needed? Are the samples generated as a stream, or as clips?

Also, can you describe how the emulator generates the image in ram, and how you are currently interfacing with the PVR in your code?

My guess is that the video output is not going to be so much of a limitation as the acutal core speed of the emulator, that could need assembly optimizations to take advantage of all the SH4 has to offer, in that case if you post a few bottleneck functions, Tapamn or BlueCrab may be able to help.

Also, a link to the code would be useful.
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1873
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 79 times
Been thanked: 61 times
Contact:

Re: Looking for DC optimization experts to help on emu port

Post by GyroVorbis »

Am I correct in assuming that any image processing/stretching is being done in software right now? You could definitely take quite a load off of the SH4 by letting the PVR do that.

Do you have a DC keyboard by any chance? It makes debugging a million times easier, and you can still leave in keyboard support for the emu. :grin:
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

SiZiOUS wrote:I have a Coders Cable and a BBA, so if you need some alpha/beta test, feel free to PM me! :)

Keep us informed about your project :) !
Awesome, those BBAs are SO damn expensive. I'll definitely keep you guys updated. I've got a sort of make-shift on-screen keyboard working, so even with just a regular controller you can still do anything (slowww text input, but better than nothing). Still need to make the sound output work on the DC hardware, and see what I can do about making it faster overall. Right now it still works fast enough to play any games meant for the original 8088 PCs at full speed. The overall experience feels like you're using an 8 MHz or so turbo XT. If I can end up squeezing high-end 286 speeds out of it, I think I would be pretty happy. It's a reasonable goal for a 200 MHz SH4 host. I'd like to let you guys try out what I've got in the next day or two.
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

PH3NOM wrote:Good call not to use SDL_Mixer on DC for audio output, it works but is stupid slow compared to interfacing the AICA using the KOS library.
Can you tell me how the sound is generated by the emulator? How many chanels are needed? Are the samples generated as a stream, or as clips?
Well, there are 4 different generating modules (PC speaker, Adlib OPL2, Sound Blaster, and Disney Sound Source emulators) and yes they all just generate sample streams in real-time. From there, all of those are mixed together in software and the final resulting samples get shoved into an output buffer that's passed to the host sound hardware chunk-by-chunk as it does callbacks for more data.


PH3NOM wrote:Also, can you describe how the emulator generates the image in ram, and how you are currently interfacing with the PVR in your code?
Since a PC supports a huge range of video modes, the exact render routines and techniques are different for each but essentially just draws the contents of the emulated video memory to a "prescale" buffer which is the same size in pixels as the video mode being emulated. It passes that along to a software scaling and blitting function that draws it onto the actual host framebuffer.


PH3NOM wrote:My guess is that the video output is not going to be so much of a limitation as the acutal core speed of the emulator, that could need assembly optimizations to take advantage of all the SH4 has to offer, in that case if you post a few bottleneck functions, Tapamn or BlueCrab may be able to help.
You might be right, it'll be interesting to see just how much it's hurting it. The SH4 needs all the help it can get still. We're dealing with 200 furious MHz. :) My actual CPU emu code is a straght-forward pure interpreter engine, but it's really not that slow. The PC version pulls off over 2 million emulated instructions per second my old 400 MHz Pentium 2, even as it's software-scaling the video output. I'd have written a dynamic recompiler, but I'd have to do a lot of research on it. I've never really looked into it much, and this is the first CPU core I've ever written.


PH3NOM wrote:Also, a link to the code would be useful.
Definitely, I'm just cleaning up some ugly rough-edges first but I'll post the DC port code tomorrow for sure. Until then, the regular version's code is on Sourceforge to look at which will still give you a good look at how it does things.

Link to the CPU engine: http://sourceforge.net/p/fake86/code/ci ... ke86/cpu.c
The rest of it: http://sourceforge.net/p/fake86/code/ci ... rc/fake86/

You'll probably need to click "download this file" at the top of the pages, the shitty SF.net online viewer cuts off a lot of code.

Thanks for the help! Some parts of the source are a little sloppy, I just finally started using C a couple years ago after being a BASIC guy since the 80's! I have some leftover bad coding habits I'm trying to drop. (globals and externs everywhere!) :lol:
Last edited by miker00lz on Wed Mar 06, 2013 2:24 am, edited 1 time in total.
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

GyroVorbis wrote:Am I correct in assuming that any image processing/stretching is being done in software right now? You could definitely take quite a load off of the SH4 by letting the PVR do that.

Do you have a DC keyboard by any chance? It makes debugging a million times easier, and you can still leave in keyboard support for the emu. :grin:
Yep, it's all done in software. Since it was meant to be a portable SDL app, I figured it was the best way to go but the DC is just too weak.. definitely will need to be done in hardware. The stretching code runs in it's own thread, so on any remotely modern home computer there are at least 2 cores so zero performance impact. Can you even buy single core chips anymore these days? Been a while since I've seen one. :)

No I don't have a DC keyboard yet, but placed an order through Amazon for one earlier tonight, they're pretty cheap these days. It was about $15 USD with shipping.
Ayla
DC Developer
DC Developer
Posts: 142
Joined: Thu Apr 03, 2008 7:01 am
Has thanked: 0
Been thanked: 4 times
Contact:

Re: Looking for DC optimization experts to help on emu port

Post by Ayla »

The software mixing routine could be moved to hardware as well, the AICA can do mixing.
Ex-Cyber
DCEmu User with No Life
DCEmu User with No Life
Posts: 3641
Joined: Sat Feb 16, 2002 1:55 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by Ex-Cyber »

miker00lz wrote:My actual CPU emu code is a straght-forward pure interpreter engine, but it's really not that slow. The PC version pulls off over 2 million emulated instructions per second my old 400 MHz Pentium 2, even as it's software-scaling the video output. I'd have written a dynamic recompiler, but I'd have to do a lot of research on it. I've never really looked into it much, and this is the first CPU core I've ever written.
I'd definitely look into optimizing the interpreter before switching to a recompiler. Usually there is some way to slim down or even eliminate the central dispatch loop and the overhead of function calls to instruction handlers. I also wonder whether it would be worthwhile to reduce the use of conditionals in your flag code (for branch prediction reasons), i.e. replacing code like

Code: Select all

if (dst & 0xFF00) {
cf = 1;
}
else {
cf = 0;
}
with code like

Code: Select all

cf = (dst & 0xFF00) != 0;
For all I know, though, those might generate identical code after optimization.
"You know, I have a great, wonderful, really original method of teaching antitrust law, and it kept 80 percent of the students awake. They learned things. It was fabulous." -- Justice Stephen Breyer
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

Ex-Cyber wrote:I'd definitely look into optimizing the interpreter before switching to a recompiler. Usually there is some way to slim down or even eliminate the central dispatch loop and the overhead of function calls to instruction handlers. I also wonder whether it would be worthwhile to reduce the use of conditionals in your flag code (for branch prediction reasons), i.e. replacing code like

Code: Select all

if (dst & 0xFF00) {
cf = 1;
}
else {
cf = 0;
}
with code like

Code: Select all

cf = (dst & 0xFF00) != 0;
For all I know, though, those might generate identical code after optimization.
Hmm, good point about the flag calcs. I'll play around with that idea and compare the assembly output.
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

I managed to squeeze an extra 5-10% better performance out of it, depending on what it's emulating. I added a cache to hold all of the important data from the time-expensive parsing of every addressing mode byte (mod/reg/rm) in memory the first time it gets emulated. Subsequent attempts to parse the byte again check to make sure the data bytes there weren't overwritten since the previous caching, and if it didn't change it just pulls the info it needs from the cache.

Every little bit helps, but I really expected to get a much larger gain than that. :?
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

BTW, I came up with a cheap solution for emulating a hard drive from a file off a read-only CD to prevent reporting write errors to the emulated programs. I dedicated 1 MB of the DC's RAM to remember the data sectors that want to be written, then on a read request it just check an index to see if it should take the wanted sector from the CD or from RAM. :)

I know 1 MB isn't a lot, but since the whole project here is aimed at running vintage real-mode games it should be fine. They usually do little more than write some small save game files or update a high scores file. If it actually does run out of write RAM, it just fails and tells the calling program the disk is write-protected.
User avatar
SiZiOUS
DC Developer
DC Developer
Posts: 404
Joined: Fri Mar 05, 2004 2:22 pm
Location: France
Has thanked: 27 times
Been thanked: 19 times
Contact:

Re: Looking for DC optimization experts to help on emu port

Post by SiZiOUS »

Maybe you could save the data to the VMU compressed by LZMA for example (it has a very high ratio), or these datas are completly useless?
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

SiZiOUS wrote:Maybe you could save the data to the VMU compressed by LZMA for example (it has a very high ratio), or these datas are completly useless?
That's a pretty good idea actually. Another thing besides scores and savegames that would probably be written to disk a lot is game settings/configurations. It would definitely be good to have real save support. Nobody likes spending a few hours playing a game only to find they can't save their progress. :lol:

I'm going to try to put together a bootable .CDI of where its at so far and upload it somewhere tonight or tomorrow for input. I'll make the code available at the same time.
User avatar
PH3NOM
DC Developer
DC Developer
Posts: 576
Joined: Fri Jun 18, 2010 9:29 pm
Has thanked: 0
Been thanked: 5 times

Re: Looking for DC optimization experts to help on emu port

Post by PH3NOM »

miker00lz wrote:
PH3NOM wrote:Good call not to use SDL_Mixer on DC for audio output, it works but is stupid slow compared to interfacing the AICA using the KOS library.
Can you tell me how the sound is generated by the emulator? How many chanels are needed? Are the samples generated as a stream, or as clips?
Well, there are 4 different generating modules (PC speaker, Adlib OPL2, Sound Blaster, and Disney Sound Source emulators) and yes they all just generate sample streams in real-time. From there, all of those are mixed together in software and the final resulting samples get shoved into an output buffer that's passed to the host sound hardware chunk-by-chunk as it does callbacks for more data.
That should adapt quite easily to the sound driver I was working on for working with the KOS AICA stream functions. It supports single/stereo channel streams, although in theory it should handle more than 1 stream.

Here is the source, with an example that streams a wave file from the /cd/
snddrv.rar
SNDDRV - KOS Sound Driver (C) PH3NOM
(3.19 KiB) Downloaded 133 times
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Looking for DC optimization experts to help on emu port

Post by BlueCrab »

PH3NOM wrote:
miker00lz wrote:
PH3NOM wrote:Good call not to use SDL_Mixer on DC for audio output, it works but is stupid slow compared to interfacing the AICA using the KOS library.
Can you tell me how the sound is generated by the emulator? How many chanels are needed? Are the samples generated as a stream, or as clips?
Well, there are 4 different generating modules (PC speaker, Adlib OPL2, Sound Blaster, and Disney Sound Source emulators) and yes they all just generate sample streams in real-time. From there, all of those are mixed together in software and the final resulting samples get shoved into an output buffer that's passed to the host sound hardware chunk-by-chunk as it does callbacks for more data.
That should adapt quite easily to the sound driver I was working on for working with the KOS AICA stream functions. It supports single/stereo channel streams, although in theory it should handle more than 1 stream.
CrabEmu's source is also another piece of code that could be referenced. Here's the relevant file. It should be relatively easy to understand. It is a little bit more low-level than KOS' normal sound streaming functionality, but it works well for what I use it for in the emulator. :)

It also assumes that each channel is in its own separate buffer when it is fed into the code, which is much nicer to things than dealing with separating out stereo streams that are interleaved, especially if you're generating the sample data at runtime anyway (as you would be doing in an emulator, of course).
miker00lz
DCEmu Cool Newbie
DCEmu Cool Newbie
Posts: 16
Joined: Sat Mar 02, 2013 7:47 pm
Has thanked: 0
Been thanked: 0

Re: Looking for DC optimization experts to help on emu port

Post by miker00lz »

Excellent, thanks guys. Sorry I've been away a bit. My back has been killing me, I've been laying down a lot. I'm going to have a look at both of your guys' code. The DC keyboard I ordered came in today, so I'd like to hurry up and back into working on this. :)
TapamN
DC Developer
DC Developer
Posts: 104
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 88 times

Re: Looking for DC optimization experts to help on emu port

Post by TapamN »

I had worked a bit on making a PVR accelerated renderer for Genesis emulation. One trick I came up with for it was a way of using 2, 4, or 8 bit palettized textures on the PVR without having to do an expensive twiddling operation on it first. It does this by (ab)using the compressed texture format of the PVR. I've used this to make sort of a library for using palettized frame buffers on the DC, which should help your emulator. The attached file contains the library and a demo drawing program (with source and precompiled ELF) to show how its used.

It's not really a separate library that gets linked, it's just some C files to add to the program. The important files are the ones that start with "fb_", the rest are part of the drawing program. It should be simple to use; look at main() to see how. The general idea for how to use it looks like this:

At initialization, create some textures that the framebuffer gets fed into, and convert the palette into a VQ codebook.
Then each frame, update the codebook if the palette has changed, use copy_fb to copy the framebuffer and codebook from main RAM to video RAM, and render it. Because of the way the routines work, the source framebuffer and codebook MUST be aligned to an 8 byte boundary.

The demo program also double buffers the textures to prevent tearing.

This also uses the PVR's vertical scaling feature. It might not work in an emulator, but on real hardware it allows for much sharper looking scaling when going from 320x200 to 640x480 than what normal bilinear would get. It first uses point sampling to scale from 320x200 to 640x400, then the PVR linearly scales it to 640x480.

I'm not familiar with how PC video modes work on a low level (QBasic always took care of that for me!), but looking at render.c of the emulator, it looks like the framebuffer is interleaved somehow. The library expects a normal, linear, packed/chunky framebuffer, so it can't read the framebuffer directly out of the emulated video RAM. You can either deinterleave to a temporary buffer (which it already seems to do), or modify the emulator to store video RAM predeinterleaved (which would probably be faster overall, although more complicated).

Here's some timing of the CPU overhead for this:
320x200x2: 0.203108 ms
320x200x4: 0.376277 ms
320x200x8: 0.781437 ms
640x350x4: 1.320551 ms
640x480x4: 1.803126 ms
640x480x8: 2.944636 ms
Attachments
drawdemo.zip
(301.46 KiB) Downloaded 128 times
Post Reply