Restoring profiling to the KOS toolchain

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
kazade
Insane DCEmu
Insane DCEmu
Posts: 141
Joined: Tue May 02, 2017 3:11 pm
Has liked: 2 times
Been liked: 19 times

Restoring profiling to the KOS toolchain

Post by kazade » Sun Dec 22, 2019 12:41 am

Hi!

I'm in desperate need of getting some real profiling data from the Dreamcast. It looks like in theory that the -pg flag should work, and we could get gmon.out generated in the /pc directory.

However, the KOS toolchain patches remove all the profiling code from crt1.S

Does anyone know if I can just copy the profiling setup asm from upstream into the custom startup file?

I don't suppose anyone has the time and know-how to have a go at this? Being able to use gprof would be a game changer!
These users liked the author kazade for the post:
ThePerfectK
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 128
Joined: Thu Apr 27, 2006 10:15 am
Has liked: 21 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by ThePerfectK » Sun Dec 22, 2019 7:39 am

This is something I've been banging my head against trying to solve for a long while. My idea was to implement a poor man's profiler -- if you aren't familiar with the concept, it's a sampling profiler that works by having the user manually raise breakpoints randomly, then using those breakpoints as samples to see where the program is bottle necking. The idea is that, with random sampling, you will naturally stop execution on parts of your program where most time is being spent.

Unfortunately, I can't find any way to manually raise a breakpoint. Moop has pointed me to a method of raising a breakpoint via the SH4 with his great work, but I haven't set aside any time to look at it. What I can say, however, is that you can simulate a poor man's profiler in a rough way without raising a break point. The way I do it is to define a macro that can be placed at the top and bottom of various functions you write. The function will take the current real time and check against a time threshhold you define in memory. If enough time has passed since the last sample, have the two macros work in concert to record the time the function being run takes. This will let you make a simulated model of the execution time and path of your program. When your function ends, via the function, have it record the time, and also print the call stack so you can do a backtrace for more information. You can manually create a firegraph with something like SDL and ImGUI on a PC to format this information into something resembling a standard profiler.

Just a thought. If you come up with a better solution, please share in as much detail as possible, as I'm all ears!
Still Thinking!~~
TapamN
DCEmu Freak
DCEmu Freak
Posts: 54
Joined: Sun Oct 04, 2009 11:13 am
Has liked: 0
Been liked: 11 times

Re: Restoring profiling to the KOS toolchain

Post by TapamN » Mon Dec 23, 2019 4:04 am

I wrote sampling profiler years ago. It uses a second thread to periodically record the PC of the main thread to an array. When you exit, the samples are saved to a text file on the PC, and a script is used to pass the addresses to addr2line and determine what functions are being run, and how much time is being spent in them. I'll upload it later.
These users liked the author TapamN for the post (total 2):
ThePerfectKIan Robinson
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 128
Joined: Thu Apr 27, 2006 10:15 am
Has liked: 21 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by ThePerfectK » Mon Dec 23, 2019 6:28 am

TapamN wrote:
Mon Dec 23, 2019 4:04 am
I wrote sampling profiler years ago. It uses a second thread to periodically record the PC of the main thread to an array. When you exit, the samples are saved to a text file on the PC, and a script is used to pass the addresses to addr2line and determine what functions are being run, and how much time is being spent in them. I'll upload it later.
this sounds pretty useful. Would there be any way to modify your profiler to also trace the stack so it'd be possible to know what called the current function?
Still Thinking!~~
kazade
Insane DCEmu
Insane DCEmu
Posts: 141
Joined: Tue May 02, 2017 3:11 pm
Has liked: 2 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by kazade » Mon Dec 30, 2019 12:53 am

I've been thinking of taking exactly the same approach! Did you have any luck finding the code?
User avatar
Ian Robinson
DCEmu Fast Newbie
DCEmu Fast Newbie
Posts: 23
Joined: Mon Mar 11, 2019 7:12 am
Has liked: 37 times
Been liked: 5 times

Re: Restoring profiling to the KOS toolchain

Post by Ian Robinson » Wed Jan 01, 2020 2:41 pm

TapamN wrote:
Mon Dec 23, 2019 4:04 am
I wrote sampling profiler years ago. It uses a second thread to periodically record the PC of the main thread to an array. When you exit, the samples are saved to a text file on the PC, and a script is used to pass the addresses to addr2line and determine what functions are being run, and how much time is being spent in them. I'll upload it later.
Do you still have it somewhere ? it's much needed by the scene :o
TapamN
DCEmu Freak
DCEmu Freak
Posts: 54
Joined: Sun Oct 04, 2009 11:13 am
Has liked: 0
Been liked: 11 times

Re: Restoring profiling to the KOS toolchain

Post by TapamN » Thu Jan 02, 2020 5:05 pm

I still have my profiling code, but it needs a bit of cleaning up. Judging from the time stamps on the files, it looks like the oldest parts of it are at least 8 years old, and some of the more recent code is about a year or two old. My coding style has changed over time, so the source looks really inconsistent at the moment. There's also some other issues that need fixing (e.g., hard coded file names and timing values) and the only documentation for it is in my head. I'd like to include a few examples on how to use it. I think I should have it ready this weekend.
These users liked the author TapamN for the post (total 5):
ThePerfectKBB HoodIan RobinsonkazadeSiZiOUS
User avatar
Ian Robinson
DCEmu Fast Newbie
DCEmu Fast Newbie
Posts: 23
Joined: Mon Mar 11, 2019 7:12 am
Has liked: 37 times
Been liked: 5 times

Re: Restoring profiling to the KOS toolchain

Post by Ian Robinson » Sat Jan 04, 2020 6:31 pm

TapamN wrote:
Thu Jan 02, 2020 5:05 pm
I still have my profiling code, but it needs a bit of cleaning up. Judging from the time stamps on the files, it looks like the oldest parts of it are at least 8 years old, and some of the more recent code is about a year or two old. My coding style has changed over time, so the source looks really inconsistent at the moment. There's also some other issues that need fixing (e.g., hard coded file names and timing values) and the only documentation for it is in my head. I'd like to include a few examples on how to use it. I think I should have it ready this weekend.
Thank you so much :)
TapamN
DCEmu Freak
DCEmu Freak
Posts: 54
Joined: Sun Oct 04, 2009 11:13 am
Has liked: 0
Been liked: 11 times

Re: Restoring profiling to the KOS toolchain

Post by TapamN » Sun Jan 05, 2020 6:43 pm

Sorry, but the profiler isn't ready yet. When I was making an example of how to add the profiler to an existing program, I noticed that the profiler was generating incorrect results. After some tweaking, it seems to be working correctly now, but it took longer than I expected. I should have it ready soon.
These users liked the author TapamN for the post (total 2):
ThePerfectKIan Robinson
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 128
Joined: Thu Apr 27, 2006 10:15 am
Has liked: 21 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by ThePerfectK » Sun Jan 05, 2020 8:29 pm

TapamN wrote:
Sun Jan 05, 2020 6:43 pm
Sorry, but the profiler isn't ready yet. When I was making an example of how to add the profiler to an existing program, I noticed that the profiler was generating incorrect results. After some tweaking, it seems to be working correctly now, but it took longer than I expected. I should have it ready soon.
Take your time and no rush. You're doing everyone a favor and it's not like this is paid work. A big thanks for your effort, but don't feel like you are obligated or owe anybody anything. Real life gets in the way often, and I completely understand.

Thanks again for the effort, though!
Still Thinking!~~
User avatar
Ian Robinson
DCEmu Fast Newbie
DCEmu Fast Newbie
Posts: 23
Joined: Mon Mar 11, 2019 7:12 am
Has liked: 37 times
Been liked: 5 times

Re: Restoring profiling to the KOS toolchain

Post by Ian Robinson » Mon Jan 06, 2020 11:51 am

ThePerfectK wrote:
Sun Jan 05, 2020 8:29 pm
TapamN wrote:
Sun Jan 05, 2020 6:43 pm
Sorry, but the profiler isn't ready yet. When I was making an example of how to add the profiler to an existing program, I noticed that the profiler was generating incorrect results. After some tweaking, it seems to be working correctly now, but it took longer than I expected. I should have it ready soon.
Take your time and no rush. You're doing everyone a favor and it's not like this is paid work. A big thanks for your effort, but don't feel like you are obligated or owe anybody anything. Real life gets in the way often, and I completely understand.

Thanks again for the effort, though!
Yes thank you for all your effort and time
User avatar
lerabot
Insane DCEmu
Insane DCEmu
Posts: 126
Joined: Sun Nov 01, 2015 8:25 pm
Has liked: 0
Been liked: 4 times

Re: Restoring profiling to the KOS toolchain

Post by lerabot » Wed Jan 08, 2020 9:49 am

Please let us know when you have anything ready :)
kazade
Insane DCEmu
Insane DCEmu
Posts: 141
Joined: Tue May 02, 2017 3:11 pm
Has liked: 2 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by kazade » Sun Jan 19, 2020 2:41 pm

So with TapamN's confirmation that it is indeed possible to profile from a background thread, I went ahead and had a go!

The result is called dcprof, and it can be found here: https://gitlab.com/simulant/dcprof

My initial tests seem to show that it's working, but please let me know how you get on with it if you try it. When run under dcload, this code will generate a gmon.out file which is compatible with gprof.

The code is released under the MIT license, feel free to submit patches/improvements!
These users liked the author kazade for the post (total 4):
|darc|ThePerfectKBB Hoodmaslevin
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 128
Joined: Thu Apr 27, 2006 10:15 am
Has liked: 21 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by ThePerfectK » Sun Jan 19, 2020 7:26 pm

kazade wrote:
Sun Jan 19, 2020 2:41 pm
So with TapamN's confirmation that it is indeed possible to profile from a background thread, I went ahead and had a go!

The result is called dcprof, and it can be found here: https://gitlab.com/simulant/dcprof

My initial tests seem to show that it's working, but please let me know how you get on with it if you try it. When run under dcload, this code will generate a gmon.out file which is compatible with gprof.

The code is released under the MIT license, feel free to submit patches/improvements!
Can't like this enough! I'd still like to see TapamN's solution if he is still working on his, but this is extremely exciting. With a proper gmon.out that means one can use all sorts of profiling visualization tools. Dude I'm so excited now!! Thanks so much!!
Still Thinking!~~
mrneo240
DCEmu Freak
DCEmu Freak
Posts: 83
Joined: Wed Mar 14, 2018 12:22 am
Has liked: 13 times
Been liked: 16 times

Re: Restoring profiling to the KOS toolchain

Post by mrneo240 » Mon Jan 20, 2020 10:16 am

No announcement but I too wrote something but requires recompile to profile, it's very different from kazade's and might give a different set of data to look at.
If interested I'll toss up a mini example.
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 128
Joined: Thu Apr 27, 2006 10:15 am
Has liked: 21 times
Been liked: 19 times

Re: Restoring profiling to the KOS toolchain

Post by ThePerfectK » Mon Jan 20, 2020 11:06 am

mrneo240 wrote:
Mon Jan 20, 2020 10:16 am
No announcement but I too wrote something but requires recompile to profile, it's very different from kazade's and might give a different set of data to look at.
If interested I'll toss up a mini example.
More implementations can only ever be a good thing. If nothing else, it could help verify results. Please do post your example if you don't mind!
Still Thinking!~~
User avatar
Ian Robinson
DCEmu Fast Newbie
DCEmu Fast Newbie
Posts: 23
Joined: Mon Mar 11, 2019 7:12 am
Has liked: 37 times
Been liked: 5 times

Re: Restoring profiling to the KOS toolchain

Post by Ian Robinson » Mon Jan 27, 2020 7:14 pm

TapamN wrote:
Sun Jan 05, 2020 6:43 pm
Sorry, but the profiler isn't ready yet. When I was making an example of how to add the profiler to an existing program, I noticed that the profiler was generating incorrect results. After some tweaking, it seems to be working correctly now, but it took longer than I expected. I should have it ready soon.
Still very much looking forward to this thanks again
TapamN
DCEmu Freak
DCEmu Freak
Posts: 54
Joined: Sun Oct 04, 2009 11:13 am
Has liked: 0
Been liked: 11 times

Re: Restoring profiling to the KOS toolchain

Post by TapamN » Sun Mar 22, 2020 3:22 am

Sorry for the LOOONG delay. There were things distracting me from this. (Mainly that my 71 year old mom fell, fractured her hip, and needed a partial hip replacement.) I'm not surprised that someone else made a profiler in the meantime, since it's actually pretty simple.

The ZIP contains my profiling and watchdog routines. There are four main components, each namespaced with a prefix:

- pf* : Time measurement using KOS's timer
- pfc* : SH4 performance counter usage
- pfSamp* : Sampling profiler
- wd* : Watchdog thread

The headers have ok documentation in them of what the functions do and how to use them. There are also two examples.

One example (main.c) shows how to profile a specific function to find out it's performance characteristics by wrapping the call with profiling code. You can build it with just "make", which creates "proftest.elf".

The second example (example.c) shows how to add profiling to an existing game or program. This can be built with "make -f Makefile.game", resulting in "game.elf". It's not a real game, but the code is shaped like how a game might be structured, with a main loop, and some do-nothing fake collision, logic, and rendering functions that waste time that the profiler will measure. The code added to the "game" to support the watchdog and profiler have been marked with comments.

It's worth noting that the examples dump the profiling results directly into /pc, the root directory of the PC.

I first wrote the pf* set of timing functions, based off of KOS's timer, which I used to cycle count loops. Later, I wrote a watchdog thread (wd* functions), to abort to dc-load if the main thread hangs up, so I could avoid having to power cycle the console. It usually doesn't recover from hangs caused by memory corruption, but can save you from accidental endless loops. Then I wrote the sampling profiler (pfSamp*) to profile the Bullet physics library. Rather than creating a second thread to gather samples, it's worked into the watchdog thread. A few years ago, I discovered the SH4 performance counter by accidentally running across Linux's driver for it when searching for something related to debugging on the SH4, and wrote my own version (pfc*).

Here's what most people are probably interested in, the sampling profiler:

To add the sampling profiler to existing code, you first need to start the watchdog thread (wdInit();), since that's what gathers the samples. Then you'll need to either add proper watchdog support (which takes a bit of work; you need to properly pet the dog each frame, and pause it before long operations like reading megabytes from disc) or just pause the watchdog immediately after you start it (wdPause();). The sampling profiler can still collect samples even when the watchdog is paused; pausing the watchdog just stops the watchdog from updating, it doesn't actually stop the thread. After the watchdog is set up, you can initialize the sampling profiler (pfSampInit(size);). The size parameter is the number of samples to allocate space for.

More samples allows the profiler to be run for longer, and longer running means more detailed results and less noise, however it also takes up more RAM. By default, one sample is collected about every 10 milliseconds (PROFILER_SAMPLE_RATE_MS constant in watchdog.c), so a 5000 sample buffer lasts for 50 seconds. Each sample is a 3 byte pointer to what the main thread was executing at that moment. Since RAM is 16 megabytes, 24 bits/3 bytes are sufficient to store any position in RAM. Not storing the top 8 bits reduces the size of the sample buffer by 1/4th. I wanted the sample buffer to be as small as possible since RAM spent on the profiler is RAM that can't be used anywhere else.

At this point the profiler is ready to go. Call pfSampGo() to start collecting samples. If you want to NOT collect samples for a certain section of code (like a load screen), you can call pfSampStop() before it, and resume afterwards with another call to pfSampGo(). When you're ready to write the results, make sure the profiler is stopped, then call pfSampSave(fname, append). The fname parameter is the filename the samples will be saved to, something like "/pc/blahblahblah/prof_samples.txt". The append parameter controls whether or not the new samples replace any existing ones (on false), or are appended to them (on true). Samples are written to a text file. Each line contains the program counter value at that time. It's a text file so that a shell script can process the results without any specialized programs.

If the sample buffer runs out of space, the profiler will stop automatically, and print a message to the console. If you want to disable this message, you'll have to edit the code. You can check if the profiler is stopped with pfSampIsRunning(). It will not automatically write the results, since that could take a while with a large buffer at slow DC-PC connection. If you want to do very long profiler runs with a small buffer, start the profiler, then poll once each frame to see if the profiler has stopped (pfSampIsRunning()). If it has stopped, save the results to the PC (in replace mode the first time, and append mode afterwards). Then reset the profiler (pfSampReset()) and start it again with pfSampGo().

To view the results of the sampling profiler, use the prof.sh script. You call it like this: "./prof.sh program.elf prof_samples.txt | less". Obviously, the first parameter is the program the samples belong to, and the second parameter is the samples themselves. The results will look something like this:

Code: Select all

4224
   1430 object_logic at example.c
    985 genwait_wait at genwait.c
    967 render_model at example.c
    538 collide_box_with_box at example.c
    177 frustum_check at example.c
     71 check_collisions at example.c
     33 run_logic at example.c
     19 cull_objects at example.c
      2 dcload_write_buffer at fs_dcload.c
      1 render at example.c
      1 genwait_wait at genwait.c:105
The first line is the number of samples collected. Each line afterwards is the number of samples per function called. You can get the percent time used by dividing the function samples by the total samples. I didn't do this in the script since I'm not good at writing scripts (this is the most advanced data processing script I've ever written...), and this works well enough for me.

This only measures the time spent in a specific function, not the time spent in any called functions. In example, "render" shows up as 1 sample, but actually takes a lot of time since it calls "render_model" (967 samples) and "frustum_check" (177 samples). In the "game" example, major operations are timed with the pf* timing functions, to get an idea how long a entire subsystem takes.

Sometimes the script does weird things, like in the example, there's "985 genwait_wait at genwait.c" and "1 genwait_wait at genwait.c:105" lines when it should probably be "986 genwait_wait at genwait.c". I haven't bothered to debug this. It's probably sh-elf-addr2line outputting something weird.

You'll probably see "genwait_wait" taking a noticeable amount of time. This means that time is spent waiting on a semaphore or something. Usually this is waiting for vblank in pvr_wait_ready, so you can kind of treat it as a "idle time" value.

I mentioned before that my profiler was generating bad results. I fixed this giving the watchdog thread a lower nice value (high priority). It was originally running with the same priority as the main thread. It seems to generate correct looking results, but it kind of reduced my confidence in it. Looking at the code for KOS's scheduler, I think the algorithm is just "run highest priority level not waiting, if that level has multiple threads, run them in round robin". I think it still should have worked, and this change shouldn't have been needed... so... *shrug* Less confident in this.

I remember seeing that someone else wrote a library for the SH4 performance counter. They seemed to be getting odd results, with it looking like the results were being multiplied by the peripheral clock or something (which does not make sense). The library I made does not seem to do this.

ThePerfectK: It's definitely be possible to have each sample be a stack trace. But it wouldn't be easy to process the results with just a script then; you'd need a real program to interpret the results. It would also require a larger sample buffer.
Attachments
profiling.zip
(9.07 KiB) Downloaded 23 times
These users liked the author TapamN for the post:
Ian Robinson
User avatar
Moopthehedgehog
DCEmu Freak
DCEmu Freak
Posts: 85
Joined: Wed Jan 05, 2011 4:25 pm
Has liked: 4 times
Been liked: 36 times

Re: Restoring profiling to the KOS toolchain

Post by Moopthehedgehog » Mon Mar 23, 2020 12:32 am

Awesome! Thanks for this!
TapamN wrote:
Sun Mar 22, 2020 3:22 am
I remember seeing that someone else wrote a library for the SH4 performance counter. They seemed to be getting odd results, with it looking like the results were being multiplied by the peripheral clock or something (which does not make sense). The library I made does not seem to do this.
Oh, this was me.

I did find out what was happening (took a lot of digging and reverse engineering to figure out what was going on there!), but I solved all that--and documented it extensively in the performance counter module in DreamHAL. I'll just say the answer was weird and kind of obscure, and leave it at that so as to not derail this topic. :P

Hope all's well now!
These users liked the author Moopthehedgehog for the post:
Ian Robinson
I'm sure Aleron Ives feels weird with his postcount back to <10668
:D
Post Reply