So if you're up on the latest commit of the master branch, you've probably already seen it, but...
I totally redid the TMU driver to clean it up, improve documentation, and increase the precision of this stuff as high as I possibly could, plus I recruited zcrc/Ayla to help with the performance aspect of the required changes. Here's the commit:
https://github.com/KallistiOS/KallistiO ... c71460cb72.
Some things to note:
1) I tried to enable both rising and falling edge detection for the TMU input clocks, but it did not work. Even with that bit enabled in the control register, it would only count on a single edge. I also checked with a few SH4 manuals, and it looks like that dual-edge clocking mechanism is only when using an external clock source, which we don't have connected to anything on DC. It doesn't look like it'll work for the peripheral clock sources.
2) The TMU driver now uses the /4 peripheral clock divisor as its input, so each counter tick is now at
80ns resolution, which is actually useful for real profiling without necessarily having to reach for the performance counters. With this, comes the following TMU driver changes:
a. timer_us_gettime() is actually forreal microsecond resolution and is not just faking it
b. timer_ns_gettime() has been added for the NS-resolution timing
3) All C, C++, and POSIX APIs have had their resolutions boosted accordingly:
a. C's clock() is now microsecond resolution
b. C11's timespec_get() is now nanosecond resolution
c. POSIX's gettimeofday() is now microsecond resolution
d. POSIX's clock_gettime() is now nanosecond resolution
e. C++'s std::chrono is now nanosecond resolution
4) The example within kos/examples/dreamcast/cpp/clock has been updated to add nanosecond resolution to the example to show it off:
5) The horrible race condition that TapamN (thank fuck) found for us is now fixed in master. I ran his test/repro program for over 2 hours straight to ensure we are finally race free.
6) Because of the way the TMU driver has been written (generically to support any peripheral clock divider by changing a single #define), there was a slow FP division required. zcrc/Ayla helped to change it into an integer multiply + bitshift with some magical integral LUT stuff, so there's no big performance concern there.
7) This whole ordeal opened up a can of worms throughout all of KOS's core... I have a PR in the works for increasing the resolution of the spin_sleep routines, then we can increase resolution of standard C/C++/POSIX calls for things like nanosleep, then we can increase the resolution of basically everything down to genwait to be higher... I'm trying to add these things piece-wise to not make a massive PR from hell for BlueCrab, but it's gonna take some time to really get all of this resolution enhancement permeating throughout the codebase.
Then here's some pretty flexing of it all working with my own codebase for my libGimbal unit tests... which run on Mac, Win, Linux, iOS, Android, PSP, and PSVita... Used to really piss me off that using standard C/C++ nanosecond resolution timing would result in this BS only for Dreamcast:
Code: Select all
* ********* Starting TestSuite [GblErrorTestSuite] *********
* [ INIT ]: GblErrorTestSuite
* [ RUN ]: GblErrorTestSuite::pendingEmpty
* [ PASS ]: GblErrorTestSuite::pendingEmpty (0.000 ms)
* [ RUN ]: GblErrorTestSuite::domainEmpty
* [ PASS ]: GblErrorTestSuite::domainEmpty (0.000 ms)
* [ RUN ]: GblErrorTestSuite::stringEmpty
* [ PASS ]: GblErrorTestSuite::stringEmpty (0.000 ms)
* [ RUN ]: GblErrorTestSuite::clearEmpty
* [ PASS ]: GblErrorTestSuite::clearEmpty (0.000 ms)
* [ RUN ]: GblErrorTestSuite::raiseCode
* [ PASS ]: GblErrorTestSuite::raiseCode (0.000 ms)
* [ RUN ]: GblErrorTestSuite::pending
* [ PASS ]: GblErrorTestSuite::pending (0.000 ms)
* [ RUN ]: GblErrorTestSuite::domain
* [ PASS ]: GblErrorTestSuite::domain (0.000 ms)
* [ RUN ]: GblErrorTestSuite::string
* [ PASS ]: GblErrorTestSuite::string (1.000 ms)
* [ RUN ]: GblErrorTestSuite::reraise
* [ PASS ]: GblErrorTestSuite::reraise (0.000 ms)
* [ RUN ]: GblErrorTestSuite::clear
* [ PASS ]: GblErrorTestSuite::clear (0.000 ms)
* [ RUN ]: GblErrorTestSuite::raiseCustomMessage
* [ PASS ]: GblErrorTestSuite::raiseCustomMessage (0.000 ms)
* [ RUN ]: GblErrorTestSuite::raiseCustomMessageVa
* [ PASS ]: GblErrorTestSuite::raiseCustomMessageVa (0.000 ms)
* [ RUN ]: GblErrorTestSuite::benchmark
* [ PASS ]: GblErrorTestSuite::benchmark (40.000 ms)
* [ FINAL ]: GblErrorTestSuite
Now check it out on the latest KOS without changing a single line of code!
Code: Select all
* ********* Starting TestSuite [GblErrorTestSuite] *********
* [ INIT ]: GblErrorTestSuite
* [ RUN ]: GblErrorTestSuite::pendingEmpty
* [ PASS ]: GblErrorTestSuite::pendingEmpty (0.018 ms)
* [ RUN ]: GblErrorTestSuite::domainEmpty
* [ PASS ]: GblErrorTestSuite::domainEmpty (0.017 ms)
* [ RUN ]: GblErrorTestSuite::stringEmpty
* [ PASS ]: GblErrorTestSuite::stringEmpty (0.017 ms)
* [ RUN ]: GblErrorTestSuite::clearEmpty
* [ PASS ]: GblErrorTestSuite::clearEmpty (0.017 ms)
* [ RUN ]: GblErrorTestSuite::raiseCode
* [ PASS ]: GblErrorTestSuite::raiseCode (0.038 ms)
* [ RUN ]: GblErrorTestSuite::pending
* [ PASS ]: GblErrorTestSuite::pending (0.017 ms)
* [ RUN ]: GblErrorTestSuite::domain
* [ PASS ]: GblErrorTestSuite::domain (0.023 ms)
* [ RUN ]: GblErrorTestSuite::string
* [ PASS ]: GblErrorTestSuite::string (0.017 ms)
* [ RUN ]: GblErrorTestSuite::reraise
* [ PASS ]: GblErrorTestSuite::reraise (0.021 ms)
* [ RUN ]: GblErrorTestSuite::clear
* [ PASS ]: GblErrorTestSuite::clear (0.017 ms)
* [ RUN ]: GblErrorTestSuite::raiseCustomMessage
* [ PASS ]: GblErrorTestSuite::raiseCustomMessage (0.045 ms)
* [ RUN ]: GblErrorTestSuite::raiseCustomMessageVa
* [ PASS ]: GblErrorTestSuite::raiseCustomMessageVa (0.048 ms)
* [ RUN ]: GblErrorTestSuite::benchmark
* [ PASS ]: GblErrorTestSuite::benchmark (41.249 ms)
* [ FINAL ]: GblErrorTestSuite