I want to be able to detect cache misses so I can better profile some code. I see DreamHAL has a perfctr module that looks like it can count cache misses but I'm not sure how to read the results correctly. It appears that you can dedicate one of the two perf counters inside the SH4 to various tasks using DreamHAL by setting a mode when you init the counter. Looking through the perfctr header file, there's a number of modes, including:
#define PMCR_OPERAND_CACHE_READ_MISS_MODE 0x04 // Quantity
#define PMCR_OPERAND_CACHE_WRITE_MISS_MODE 0x05 // Quantity
I would think starting a counter with PMCR_OPERAND_CACHE_READ_MISS_MODE would turn one of the perfctrs into a cache miss counter. However, the results I'm getting don't seem right.
Currently, I have created two 32-byte variables at specific memory addresses. myvar is at 0x8C200000, and myvar2 is at 0x8C204000, exactly 16kb away in memory. The idea is reading myvar, then myvar2, then myvar1 again should incur a cache hit, as they overlap in the direct memory mapping of the cache.
So what i've done is init the PMCR counter 1 to Operand cache read miss mode, like so:
Code: Select all
PMCR_Init(1, PMCR_OPERAND_CACHE_READ_MISS_MODE, PMCR_COUNT_CPU_CYCLES) ;
then I have a frameloop that runs over and over again. The idea is myvar and myvar2 are global variables (hence their absolute memory locations), and in the frameloop function, there is a local variable T, which I'm reading myvar and myvar2 into. First, I read myvar into T, to access memory location 0x8C200000 theoretically into cache, then print out the results. Then I restart the PMCR counter, which resets it back to 0 from last read, and starts it counting.
next, I read myvar2 into T, which access memory location 0x8C204000, which should overwrite the cache line myvar was at, invalidating the cache line and causing a cache read miss. After doing this, I read the counter into a 64-bit ValueRead variable, then print it out. All done like so:
Code: Select all
int T;
T = myvar[0];
printf("Prior to PMCR restart part 1: t = %d\n", T);
PMCR_Restart(1, PMCR_OPERAND_CACHE_READ_MISS_MODE, PMCR_COUNT_CPU_CYCLES) ;
T = myvar2[0];
ValueRead = PMCR_Read(1);
printf("PerfCount Cache Miss result: %llu\n", ValueRead);
printf("Post-PMCR read: t = %d\n", T);
The problem is ValueRead, the result of the counter, fluxuates wildly and doesn't appear to be counting cache misses this way. If I let it run in the loop, ValueRead will jump between 2-3 regularly. If I change the read inside the loop to T=myvar[0], which shouldn't invalidate the cache line, ValueRead will jump between 1-2 regularly, which to me looks like the cache line is indeed being used. But, again, the problem appears to be that the counter isn't actually counting the cache misses. I'm not sure what exactly it is counting, but it appears to just be counting CPU cycles since PMCR_Restart(1, PMCR_OPERAND_CACHE_READ_MISS_MODE, PMCR_COUNT_CPU_CYCLES), not the actual number of cache misses. If I stall for time inside the loop to make it eat up more time, then the ValueRead result of the counter will increase, even if i'm not reading or writing out to any other memory locations and thus can't be thrashing my cache.
Anyone ever tried to count cache misses using DreamHAL, and if so, any insight on how to do it properly?