Timer precision

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
User avatar
bogglez
Moderator
Moderator
Posts: 578
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Timer precision

Post by bogglez »

I investigated the timing code a bit, and have some questions/possible suggestions:

TMU2 is used for timer_ms_gettime64() which is arguably very important for game programming.
However its frequency is only set to 64 instead of the possible 4.
Its reset value is set to 50000000 / 64 instead of (2**32)-1, since it's supposed to cause interrupts every second.
The interrupt priority for TMU2 is at the maximum. Doesn't this unnecessarily interfere with other interrupts which may be more important?

How about setting the counter reset value to the maximum and the frequency to the minimum?
Then we can have a function u64 timer_getticks() which is also more interesting for relative performance comparisons (100 ticks vs 70 ticks would be a performance improvement, seconds don't matter and just make the result more inaccurate).

Then we could reimplement timer_ms_gettime64() in terms of timer_getticks(). There's no need to separate the seconds from the milliseconds by using an interrupt, or is there? It's also safe to detect an underflow even without an interrupt, if we assume that the getticks function is called frequently enough (which would be the case at (2**32)-1), so there'd be no need for an interrupt routine.

The timer_ms functions are also using some branches, divisions, external memory accesses for returning the value and function calls. Not a biggie, but could be avoided so inlining can take place.

The code in my test application looks like this:

Code: Select all

static inline u64 ak_time_get_ticks() { // in time.h for inlining
	extern u64 ak_time_ticks_since_start; // in time.c
	volatile u32 const * const TCNT2 = (volatile u32*)0xffd80024;
	return ak_time_ticks_since_start + (0xffffffff - *TCNT2);
}

// interrupt routine, could be avoided by comparing against the previous ticks value.
void ak_time_on_timer2_underflow() {
	ak_time_ticks_since_start += 0xffffffff;
	*TCR2 &= ~TCR_UNF;
}
int main() {
	*TSTR = 0; // stop timers
	*TOCR = 0; // use internal clock
	*TCR2 = 0;

	*TCOR2 = *TCNT2 = 0xffffffff;
	*TCR2  = TCR_UNIE | TCR_FREQUENCY_4;
	*TSTR  = TSTR_TMU2;

	*IPRA |= 1 << IPRA_TMU2; // Enable timer 2 interrupts and give them lowest priority


      // game loop
	u64 start = ak_time_get_ticks();
	for(;;) {
		u64 ticks = ak_time_get_ticks() - start;
                printf("%u\n", ticks);
	}

	return 0;
}
This gives me a much higher precision than what kos offers by default, it seems.
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
Newbie
Insane DCEmu
Insane DCEmu
Posts: 171
Joined: Sat Jul 27, 2013 1:16 pm
Has thanked: 0
Been thanked: 0

Re: Timer precision

Post by Newbie »

It is interesting piece of code, I may test it but it won't compile :

First problem :

I assume you will put on top of code some define for values you use in your code as :
TSTR, TOCR, TCR2, TCOR2, TCNT2, TCR_FREQUENCY_4, TCR_UNIE, TSTR_TMU2, IPRA_TMU2.

I found some in KOS like those :
#define TSTR 0x04
#define TOCR 0x00
#define TCR2 0x28
#define TCOR2 0x20
#define TCNT2 0x24
I did not find anything about TCR_FREQUENCY_4, TCR_UNIE, TSTR_TMU2, IPRA_TMU2.

Could you correct the code you post about this please.




Second problem :

Watch out those lines in your code that make "my GCC" shouting loud :

Code: Select all

volatile uint32 const * const TCNT2 = (volatile uint32*)0xffd80024;
error: expected unqualified-id before numeric constant
error: expected initializer before numeric constant

Code: Select all

return ak_time_ticks_since_start + (0xffffffff - *TCNT2);
error: invalid type argument of unary ‘*’ (have ‘int’)

Code: Select all

*TSTR = 0; // stop timers
*TOCR = 0; // use internal clock
*TCR2 = 0;

*TCOR2 = *TCNT2 = 0xffffffff;
*TCR2  = TCR_UNIE | TCR_FREQUENCY_4;
*TSTR  = TSTR_TMU2;

*IPRA |= 1 << IPRA_TMU2; // Enable timer 2 interrupts and give them lowest priority
error: invalid type argument of unary ‘*’ (have ‘int’)
Could you correct the code you post about this please.

I will be very pleased to test and use your "inlined improved timer" code.

Regards.
User avatar
bogglez
Moderator
Moderator
Posts: 578
Joined: Sun Apr 20, 2014 9:45 am
Has thanked: 0
Been thanked: 0

Re: Timer precision

Post by bogglez »

Newbie,

Thanks for your interest. The code I posted is missing defines and typedefs. I also didn't hook the interrupt routine. It's basically pseudo code to show what I'm doing in my non-kos project (as I'm trying to learn assembly from scratch).
You can modify the kos time code and change the prime() function to change the interval. In kos the interval is every 64 ticks, but every 4 ticks can be set (the frequency_4 thing in my code is defined to 0 and just there for clarity)
Wiki & tutorials: http://dcemulation.org/?title=Development
Wiki feedback: viewtopic.php?f=29&t=103940
My libgl playground (not for production): https://bitbucket.org/bogglez/libgl15
My lxdream fork (with small fixes): https://bitbucket.org/bogglez/lxdream
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Timer precision

Post by GyroVorbis »

Guys, ever since we found out that the performance counters were ill-suited to be used as persistent nanosecond timers, I've been on the hunt for giving us more resolution for timing and date/time.

The truth is our current ms resolution is ass compared to what the standard libraries and other indie consoles support. Would the proposed solution in this thread not be an easy way to beef up the resolution of clock(), gettimeofday(), std::chrono, etc?

According to the SH4 docs, it looks like TMU2 is fed by the peripheral clock to the prescalar divider, giving a max frequency of 100Mhz / 4 = 25Mhz. So what, we could have clock/time resolution of up to 40ns? That's a billion times better than what we have right now...
TapamN
DC Developer
DC Developer
Posts: 105
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 90 times

Re: Timer precision

Post by TapamN »

I think you have the right final result, but the calculation is off. The clock counts the 50 MHz peripheral clock, divided by four (which would be 12.5 MHz), but it can be set to count on both the rising and falling edges of the clock, doubling it back up to 25 MHz.

One complaint I have about the timer_*_get functions is that they preform a divide. timer_ms_countdown is used to divide the value read from the hardware timer counter to figure out how many timer ticks per second. It's basically a constant value set during initialization, but since it's technically a variable, GCC can't strength reduce the divide to a reciprocal multiply. This makes all get timer functions fairly slow since they all need integer divides, which the SH4 sucks at. The divides probably adds >70 cycles to every timer*get call. timer_ms_countdown should be removed and replaced with a define that is the appropriate value for whatever the hardware timer gets set to. There's also some assertions that check that timer_ms_countdown has been initialized that could then be removed.
These users thanked the author TapamN for the post (total 3):
BB HoodIan RobinsonGyroVorbis
User avatar
Ian Robinson
DC Developer
DC Developer
Posts: 116
Joined: Mon Mar 11, 2019 7:12 am
Has thanked: 209 times
Been thanked: 41 times

Re: Timer precision

Post by Ian Robinson »

Turned out this is the case can we fix this in kos current is problem
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Timer precision

Post by GyroVorbis »

TapamN wrote: Thu Oct 26, 2023 4:04 pm I think you have the right final result, but the calculation is off. The clock counts the 50 MHz peripheral clock, divided by four (which would be 12.5 MHz), but it can be set to count on both the rising and falling edges of the clock, doubling it back up to 25 MHz.
OH SHIT, you're right! Good call!!
TapamN wrote: Thu Oct 26, 2023 4:04 pm One complaint I have about the timer_*_get functions is that they preform a divide. timer_ms_countdown is used to divide the value read from the hardware timer counter to figure out how many timer ticks per second. It's basically a constant value set during initialization, but since it's technically a variable, GCC can't strength reduce the divide to a reciprocal multiply. This makes all get timer functions fairly slow since they all need integer divides, which the SH4 sucks at. The divides probably adds >70 cycles to every timer*get call. timer_ms_countdown should be removed and replaced with a define that is the appropriate value for whatever the hardware timer gets set to. There's also some assertions that check that timer_ms_countdown has been initialized that could then be removed.
Thanks for sharing this and good point. I have a bunch of stuff going on at work, but I do have a branch I started on which includes your fix for the race condition, plus I was going to look into this as well. I'll post it here for your opinion when I'm ready to PR, because I agree, we don't want these functions to be slow at all for this stuff.
Ayla
DC Developer
DC Developer
Posts: 142
Joined: Thu Apr 03, 2008 7:01 am
Has thanked: 0
Been thanked: 4 times
Contact:

Re: Timer precision

Post by Ayla »

TapamN wrote: Thu Oct 26, 2023 4:04 pm One complaint I have about the timer_*_get functions is that they preform a divide. timer_ms_countdown is used to divide the value read from the hardware timer counter to figure out how many timer ticks per second. It's basically a constant value set during initialization, but since it's technically a variable, GCC can't strength reduce the divide to a reciprocal multiply. This makes all get timer functions fairly slow since they all need integer divides, which the SH4 sucks at. The divides probably adds >70 cycles to every timer*get call. timer_ms_countdown should be removed and replaced with a define that is the appropriate value for whatever the hardware timer gets set to. There's also some assertions that check that timer_ms_countdown has been initialized that could then be removed.
You can use my "libudiv" for that: https://gist.github.com/pcercuei/83ed3d ... f7224cdc41
These users thanked the author Ayla for the post (total 2):
Ian RobinsonGyroVorbis
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Timer precision

Post by GyroVorbis »

Ayla wrote: Sun Nov 05, 2023 9:18 am
TapamN wrote: Thu Oct 26, 2023 4:04 pm One complaint I have about the timer_*_get functions is that they preform a divide. timer_ms_countdown is used to divide the value read from the hardware timer counter to figure out how many timer ticks per second. It's basically a constant value set during initialization, but since it's technically a variable, GCC can't strength reduce the divide to a reciprocal multiply. This makes all get timer functions fairly slow since they all need integer divides, which the SH4 sucks at. The divides probably adds >70 cycles to every timer*get call. timer_ms_countdown should be removed and replaced with a define that is the appropriate value for whatever the hardware timer gets set to. There's also some assertions that check that timer_ms_countdown has been initialized that could then be removed.
You can use my "libudiv" for that: https://gist.github.com/pcercuei/83ed3d ... f7224cdc41
Not gonna lie, the thought had crossed my mind. Haven't gotten around to playing with it yet, because of work obligations, but now I will. :mrgreen:
User avatar
Ian Robinson
DC Developer
DC Developer
Posts: 116
Joined: Mon Mar 11, 2019 7:12 am
Has thanked: 209 times
Been thanked: 41 times

Re: Timer precision

Post by Ian Robinson »

TapamN wrote: Thu Oct 26, 2023 4:04 pm I think you have the right final result, but the calculation is off. The clock counts the 50 MHz peripheral clock, divided by four (which would be 12.5 MHz), but it can be set to count on both the rising and falling edges of the clock, doubling it back up to 25 MHz.

One complaint I have about the timer_*_get functions is that they preform a divide. timer_ms_countdown is used to divide the value read from the hardware timer counter to figure out how many timer ticks per second. It's basically a constant value set during initialization, but since it's technically a variable, GCC can't strength reduce the divide to a reciprocal multiply. This makes all get timer functions fairly slow since they all need integer divides, which the SH4 sucks at. The divides probably adds >70 cycles to every timer*get call. timer_ms_countdown should be removed and replaced with a define that is the appropriate value for whatever the hardware timer gets set to. There's also some assertions that check that timer_ms_countdown has been initialized that could then be removed.
Hi,
Do you have example of doing this The clock counts the 50 MHz peripheral clock, divided by four (which would be 12.5 MHz), but it can be set to count on both the rising and falling edges of the clock, doubling it back up to 25 MHz. i have not been able to get this to work..
nymus
DC Developer
DC Developer
Posts: 968
Joined: Tue Feb 11, 2003 4:12 pm
Location: In a Dream
Has thanked: 5 times
Been thanked: 6 times

Re: Timer precision

Post by nymus »

Is it not OK/faster to use the FPU for division and convert back to integer? I guess the #define method works if there are just a few common values being used.
These users thanked the author nymus for the post:
Ian Robinson
behold the mind
inspired by Dreamcast
User avatar
Ian Robinson
DC Developer
DC Developer
Posts: 116
Joined: Mon Mar 11, 2019 7:12 am
Has thanked: 209 times
Been thanked: 41 times

Re: Timer precision

Post by Ian Robinson »

nymus wrote: Wed Dec 20, 2023 9:42 am Is it not OK/faster to use the FPU for division and convert back to integer? I guess the #define method works if there are just a few common values being used.
always fast to never use any division even fpu version is still slow
These users thanked the author Ian Robinson for the post:
nymus
TapamN
DC Developer
DC Developer
Posts: 105
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 90 times

Re: Timer precision

Post by TapamN »

Ian Robinson wrote: Mon Dec 18, 2023 12:42 am Do you have example of doing this The clock counts the 50 MHz peripheral clock, divided by four (which would be 12.5 MHz), but it can be set to count on both the rising and falling edges of the clock, doubling it back up to 25 MHz. i have not been able to get this to work..
I haven't tried this, but I think this should work:

Code: Select all

int timer_prime(int which, uint32 speed, int interrupts) {
    if(interrupts)
        TIMER16(tcrs[which]) = 32 | 0x10;
    else
        TIMER16(tcrs[which]) = 0x10;
    TIMER32(tcnts[which]) = 50000000 / (speed * 2);
    TIMER32(tcors[which]) = 50000000 / (speed * 2);
    if(interrupts)
        timer_enable_ints(which);
    return 0;
}
If the lowest three bits of TCR are 0, the clock will run at 1/4 the peripheral clock (50 MHz/4) and setting bits 4 and 3 to 0b10 (0x10 after shifting) will tick the clock on both rising and falling edges of the clock signal, getting 25 MHz.
nymus wrote: Wed Dec 20, 2023 9:42 am Is it not OK/faster to use the FPU for division and convert back to integer?
That would be faster than the original code, and GCC has an option to preform 32-bit divides with the FPU (but I think it's off in KOS since GCC's implementation doesn't work well with how the XMTRX bank is used in games), but it's still not faster than replacing the divide completely. A 32-bit divide only works if the value fits in 24-bits (maybe 25-bits if signed?). Working over the full range requires doubles, which requires a bit of work to change FPSCR to put the SH4 into double mode. But even 32-bit FDIV is slower than DMUL by a reciprocal.
These users thanked the author TapamN for the post (total 2):
nymusIan Robinson
User avatar
GyroVorbis
Elysian Shadows Developer
Elysian Shadows Developer
Posts: 1874
Joined: Mon Mar 22, 2004 4:55 pm
Location: #%^&*!!!11one Super Sonic
Has thanked: 80 times
Been thanked: 62 times
Contact:

Re: Timer precision

Post by GyroVorbis »

So if you're up on the latest commit of the master branch, you've probably already seen it, but...

I totally redid the TMU driver to clean it up, improve documentation, and increase the precision of this stuff as high as I possibly could, plus I recruited zcrc/Ayla to help with the performance aspect of the required changes. Here's the commit: https://github.com/KallistiOS/KallistiO ... c71460cb72.

Some things to note:

1) I tried to enable both rising and falling edge detection for the TMU input clocks, but it did not work. Even with that bit enabled in the control register, it would only count on a single edge. I also checked with a few SH4 manuals, and it looks like that dual-edge clocking mechanism is only when using an external clock source, which we don't have connected to anything on DC. It doesn't look like it'll work for the peripheral clock sources. :(

2) The TMU driver now uses the /4 peripheral clock divisor as its input, so each counter tick is now at 80ns resolution, which is actually useful for real profiling without necessarily having to reach for the performance counters. With this, comes the following TMU driver changes:
a. timer_us_gettime() is actually forreal microsecond resolution and is not just faking it
b. timer_ns_gettime() has been added for the NS-resolution timing

3) All C, C++, and POSIX APIs have had their resolutions boosted accordingly:
a. C's clock() is now microsecond resolution
b. C11's timespec_get() is now nanosecond resolution
c. POSIX's gettimeofday() is now microsecond resolution
d. POSIX's clock_gettime() is now nanosecond resolution
e. C++'s std::chrono is now nanosecond resolution

4) The example within kos/examples/dreamcast/cpp/clock has been updated to add nanosecond resolution to the example to show it off:
Screenshot from 2023-11-16 00-03-34.png
5) The horrible race condition that TapamN (thank fuck) found for us is now fixed in master. I ran his test/repro program for over 2 hours straight to ensure we are finally race free.

6) Because of the way the TMU driver has been written (generically to support any peripheral clock divider by changing a single #define), there was a slow FP division required. zcrc/Ayla helped to change it into an integer multiply + bitshift with some magical integral LUT stuff, so there's no big performance concern there.

7) This whole ordeal opened up a can of worms throughout all of KOS's core... I have a PR in the works for increasing the resolution of the spin_sleep routines, then we can increase resolution of standard C/C++/POSIX calls for things like nanosleep, then we can increase the resolution of basically everything down to genwait to be higher... I'm trying to add these things piece-wise to not make a massive PR from hell for BlueCrab, but it's gonna take some time to really get all of this resolution enhancement permeating throughout the codebase.

Then here's some pretty flexing of it all working with my own codebase for my libGimbal unit tests... which run on Mac, Win, Linux, iOS, Android, PSP, and PSVita... Used to really piss me off that using standard C/C++ nanosecond resolution timing would result in this BS only for Dreamcast:

Code: Select all

* ********* Starting TestSuite [GblErrorTestSuite] *********
* [ INIT      ]: GblErrorTestSuite
* [ RUN       ]: GblErrorTestSuite::pendingEmpty
* [      PASS ]: GblErrorTestSuite::pendingEmpty (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::domainEmpty
* [      PASS ]: GblErrorTestSuite::domainEmpty (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::stringEmpty
* [      PASS ]: GblErrorTestSuite::stringEmpty (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::clearEmpty
* [      PASS ]: GblErrorTestSuite::clearEmpty (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::raiseCode
* [      PASS ]: GblErrorTestSuite::raiseCode (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::pending
* [      PASS ]: GblErrorTestSuite::pending (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::domain
* [      PASS ]: GblErrorTestSuite::domain (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::string
* [      PASS ]: GblErrorTestSuite::string (1.000 ms)
* [ RUN       ]: GblErrorTestSuite::reraise
* [      PASS ]: GblErrorTestSuite::reraise (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::clear
* [      PASS ]: GblErrorTestSuite::clear (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::raiseCustomMessage
* [      PASS ]: GblErrorTestSuite::raiseCustomMessage (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::raiseCustomMessageVa
* [      PASS ]: GblErrorTestSuite::raiseCustomMessageVa (0.000 ms)
* [ RUN       ]: GblErrorTestSuite::benchmark
* [      PASS ]: GblErrorTestSuite::benchmark (40.000 ms)
* [ FINAL     ]: GblErrorTestSuite
Now check it out on the latest KOS without changing a single line of code!

Code: Select all

* ********* Starting TestSuite [GblErrorTestSuite] *********
* [ INIT      ]: GblErrorTestSuite
* [ RUN       ]: GblErrorTestSuite::pendingEmpty
* [      PASS ]: GblErrorTestSuite::pendingEmpty (0.018 ms)
* [ RUN       ]: GblErrorTestSuite::domainEmpty
* [      PASS ]: GblErrorTestSuite::domainEmpty (0.017 ms)
* [ RUN       ]: GblErrorTestSuite::stringEmpty
* [      PASS ]: GblErrorTestSuite::stringEmpty (0.017 ms)
* [ RUN       ]: GblErrorTestSuite::clearEmpty
* [      PASS ]: GblErrorTestSuite::clearEmpty (0.017 ms)
* [ RUN       ]: GblErrorTestSuite::raiseCode
* [      PASS ]: GblErrorTestSuite::raiseCode (0.038 ms)
* [ RUN       ]: GblErrorTestSuite::pending
* [      PASS ]: GblErrorTestSuite::pending (0.017 ms)
* [ RUN       ]: GblErrorTestSuite::domain
* [      PASS ]: GblErrorTestSuite::domain (0.023 ms)
* [ RUN       ]: GblErrorTestSuite::string
* [      PASS ]: GblErrorTestSuite::string (0.017 ms)
* [ RUN       ]: GblErrorTestSuite::reraise
* [      PASS ]: GblErrorTestSuite::reraise (0.021 ms)
* [ RUN       ]: GblErrorTestSuite::clear
* [      PASS ]: GblErrorTestSuite::clear (0.017 ms)
* [ RUN       ]: GblErrorTestSuite::raiseCustomMessage
* [      PASS ]: GblErrorTestSuite::raiseCustomMessage (0.045 ms)
* [ RUN       ]: GblErrorTestSuite::raiseCustomMessageVa
* [      PASS ]: GblErrorTestSuite::raiseCustomMessageVa (0.048 ms)
* [ RUN       ]: GblErrorTestSuite::benchmark
* [      PASS ]: GblErrorTestSuite::benchmark (41.249 ms)
* [ FINAL     ]: GblErrorTestSuite
These users thanked the author GyroVorbis for the post (total 2):
BB HoodIan Robinson
Post Reply