Detecting cache misses?

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 147
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Thu Apr 27, 2006 10:15 am
Has thanked: 27 times
Been thanked: 35 times

Detecting cache misses?

Post by ThePerfectK »

I want to be able to detect cache misses so I can better profile some code. I see DreamHAL has a perfctr module that looks like it can count cache misses but I'm not sure how to read the results correctly. It appears that you can dedicate one of the two perf counters inside the SH4 to various tasks using DreamHAL by setting a mode when you init the counter. Looking through the perfctr header file, there's a number of modes, including:

#define PMCR_OPERAND_CACHE_READ_MISS_MODE 0x04 // Quantity
#define PMCR_OPERAND_CACHE_WRITE_MISS_MODE 0x05 // Quantity

I would think starting a counter with PMCR_OPERAND_CACHE_READ_MISS_MODE would turn one of the perfctrs into a cache miss counter. However, the results I'm getting don't seem right.

Currently, I have created two 32-byte variables at specific memory addresses. myvar is at 0x8C200000, and myvar2 is at 0x8C204000, exactly 16kb away in memory. The idea is reading myvar, then myvar2, then myvar1 again should incur a cache hit, as they overlap in the direct memory mapping of the cache.

So what i've done is init the PMCR counter 1 to Operand cache read miss mode, like so:

Code: Select all

PMCR_Init(1, PMCR_OPERAND_CACHE_READ_MISS_MODE, PMCR_COUNT_CPU_CYCLES) ;
then I have a frameloop that runs over and over again. The idea is myvar and myvar2 are global variables (hence their absolute memory locations), and in the frameloop function, there is a local variable T, which I'm reading myvar and myvar2 into. First, I read myvar into T, to access memory location 0x8C200000 theoretically into cache, then print out the results. Then I restart the PMCR counter, which resets it back to 0 from last read, and starts it counting.

next, I read myvar2 into T, which access memory location 0x8C204000, which should overwrite the cache line myvar was at, invalidating the cache line and causing a cache read miss. After doing this, I read the counter into a 64-bit ValueRead variable, then print it out. All done like so:

Code: Select all

int T;
T = myvar[0];
printf("Prior to PMCR restart part 1: t = %d\n", T);
PMCR_Restart(1, PMCR_OPERAND_CACHE_READ_MISS_MODE, PMCR_COUNT_CPU_CYCLES) ;
T = myvar2[0];
ValueRead = PMCR_Read(1);
printf("PerfCount Cache Miss result: %llu\n", ValueRead);
printf("Post-PMCR read: t = %d\n", T);
The problem is ValueRead, the result of the counter, fluxuates wildly and doesn't appear to be counting cache misses this way. If I let it run in the loop, ValueRead will jump between 2-3 regularly. If I change the read inside the loop to T=myvar[0], which shouldn't invalidate the cache line, ValueRead will jump between 1-2 regularly, which to me looks like the cache line is indeed being used. But, again, the problem appears to be that the counter isn't actually counting the cache misses. I'm not sure what exactly it is counting, but it appears to just be counting CPU cycles since PMCR_Restart(1, PMCR_OPERAND_CACHE_READ_MISS_MODE, PMCR_COUNT_CPU_CYCLES), not the actual number of cache misses. If I stall for time inside the loop to make it eat up more time, then the ValueRead result of the counter will increase, even if i'm not reading or writing out to any other memory locations and thus can't be thrashing my cache.

Anyone ever tried to count cache misses using DreamHAL, and if so, any insight on how to do it properly?
These users thanked the author ThePerfectK for the post:
Ian Robinson
Still Thinking!~~
User avatar
ThePerfectK
Insane DCEmu
Insane DCEmu
Posts: 147
Joined: Thu Apr 27, 2006 10:15 am
Has thanked: 27 times
Been thanked: 35 times

Re: Detecting cache misses?

Post by ThePerfectK »

so just a heads up, following up on this, using sh-elf-objdump and looking at the disassembled code that gcc spits out after compilation, it seems the counter is indeed behaving correctly. O3 optimization makes judging the order and count of cache misses (and other countable occurances from the perf counter like JMP/BRA issued/taken count) a bit difficult if you gauge just from the C/C++ code as Dreamhal recommends. That said, by writing some code that put variables in different compilation units to ensure they wouldn't be optimized out in the assembly, I was indeed able to write a few tests and see the counter working correctly. I can even see how prefetches will eliminate cache misses. There's a few funny things, tho. It seems reading the perf counter itself can cause a cache miss.

An aside, but using the perf counter just shows how messy printf (and similar functions) really is, people should avoid using it at all costs. I showed like 140 JMP/BRA issued from a single printf command using the perf counter.
These users thanked the author ThePerfectK for the post:
Ian Robinson
Still Thinking!~~
User avatar
Ian Robinson
DC Developer
DC Developer
Posts: 116
Joined: Mon Mar 11, 2019 7:12 am
Has thanked: 209 times
Been thanked: 41 times

Re: Detecting cache misses?

Post by Ian Robinson »

ThePerfectK wrote: Wed Feb 09, 2022 12:27 am so just a heads up, following up on this, using sh-elf-objdump and looking at the disassembled code that gcc spits out after compilation, it seems the counter is indeed behaving correctly. O3 optimization makes judging the order and count of cache misses (and other countable occurances from the perf counter like JMP/BRA issued/taken count) a bit difficult if you gauge just from the C/C++ code as Dreamhal recommends. That said, by writing some code that put variables in different compilation units to ensure they wouldn't be optimized out in the assembly, I was indeed able to write a few tests and see the counter working correctly. I can even see how prefetches will eliminate cache misses. There's a few funny things, tho. It seems reading the perf counter itself can cause a cache miss.

An aside, but using the perf counter just shows how messy printf (and similar functions) really is, people should avoid using it at all costs. I showed like 140 JMP/BRA issued from a single printf command using the perf counter.
great work yes printf if left on will degrade things a lot.. Any thing like that good work confirming that I knew it was happening just good to know it's a real thing..
Post Reply