So I'm working through the sq_cpy function in sq.c and I have some questions regarding the actual loop. Let me post what I'm pretty sure the function is doing, line by line, and then I'll ask my question:
Code: Select all
unsigned int *d = (unsigned int *)(void *)
(0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
Code: Select all
const unsigned int *s = src;
Code: Select all
/* Set store queue memory area as desired */
QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
Code: Select all
/* fill/write queues as many times necessary */
n >>= 5;
everything been really straight forward, now comes the confusion with the actual loop
Code: Select all
while(n--) {
asm("pref @%0" : : "r"(s + 8)); /* prefetch 32 bytes for next loop */
d[0] = *(s++);
d[1] = *(s++);
d[2] = *(s++);
d[3] = *(s++);
d[4] = *(s++);
d[5] = *(s++);
d[6] = *(s++);
d[7] = *(s++);
asm("pref @%0" : : "r"(d));
d += 8;
}
then the next 8 commands, they're setting individual bytes beginning at pointer d, each one being set to the next element s points to, straight forward I guess? Our d pointer in this case is specially formatted, am I correct, to land in the "storage queue region" correct?
then the final asm command, another pref, this time one that signals to actually preform the store queue transfer, right?
so my question -- where exactly are we loading values into the store queue here? I don't see where we actually ever specified what is in the store queue. What determines what is and isn't in the store queue? I see that the memory area from 0xe0000000 to 0xe3ffffff are called the "store queue region" is it that, if we allocate some data in there, as *d points to, that's loading data into the store queue?
Just want to make sure I have all this correct for my own benefit.
EDIT: A bit more reading and I think I understand it now. I saw the store queue write section of the SH4 manual, which specifies how to write to the store queues by address:
Code: Select all
A write to the SQs can be performed using a store instruction on P4 area 0xE000
0000 to 0xE3FF FFFC. A longword or quadword access size can be used. The
meaning of the address bits is as follows:
[31:26]: 111000 Store queue specification
[25:6]: Don’t care Used for external memory transfer/access right
[5]: 0/1 0: SQ0 specification 1: SQ1 specification
[4:2]: LW specification Specifies longword position in SQ0/SQ1
[1:0] 00 Fixed at 0
The middle 20 bits are the destination address, which, coupled with our QACR0/1 knows which 64mb area of memory to transfer to, which frees up a few more bits for selecting which store queue spot we're writing to. Bits 2, 3, and 4 (representing 8 store queue bytes) are which byte in the store queue longword we're writing to, then bit 5 is the specific store queue we're writing to (either 0 or 1, 0 in sq_cpy's case).
we set up d so that d[0] points to the first byte in the longword store queue 0. the following:
Code: Select all
d[0] = *(s++);
d[1] = *(s++);
d[2] = *(s++);
d[3] = *(s++);
d[4] = *(s++);
d[5] = *(s++);
d[6] = *(s++);
d[7] = *(s++);
This makes more sense if you look at sq_clr as you can see the priming of the two store queues easier:
Code: Select all
/* clears n bytes at dest, dest must be 32-byte aligned */
void sq_clr(void *dest, int n) {
unsigned int *d = (unsigned int *)(void *)
(0xe0000000 | (((unsigned long)dest) & 0x03ffffe0));
/* Set store queue memory area as desired */
QACR0 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
QACR1 = ((((unsigned int)dest) >> 26) << 2) & 0x1c;
/* Fill both store queues with zeroes */
d[0] = d[1] = d[2] = d[3] = d[4] = d[5] = d[6] = d[7] =
d[8] = d[9] = d[10] = d[11] = d[12] = d[13] = d[14] = d[15] = 0;
/* Write them as many times necessary */
n >>= 5;
while(n--) {
__asm__("pref @%0" : : "r"(d));
d += 8;
}
/* Wait for both store queues to complete */
d = (unsigned int *)0xe0000000;
d[0] = d[8] = 0;
}
Makes much more sense, but I'm a bit confues because the sh4 manual says bits 0, 1 of the store queue region address must remain 0, which would make counting forward 16 bytes not work correctly. What am I missing that makes this all fit correctly?