Bug in sem_init

If you have any questions on programming, this is the place to ask them, whether you're a newbie or an experienced programmer. Discussion on programming in general is also welcome. We will help you with programming homework, but we will not do your work for you! Any porting requests must be made in Developmental Ideas.
Post Reply
TapamN
DC Developer
DC Developer
Posts: 104
https://www.artistsworkshop.eu/meble-kuchenne-na-wymiar-warszawa-gdzie-zamowic/
Joined: Sun Oct 04, 2009 11:13 am
Has thanked: 2 times
Been thanked: 88 times

Bug in sem_init

Post by TapamN »

This was frustrating to figure out. I'm working on a new PVR driver, and I ran into a bug in sem_init. I was initializing a semaphore as part of a malloc-ed struct, and the semaphore appeared to be getting corrupted or not initialized (When I went to wait on the semaphore, I sometimes found the count to be a random negative number, even through I always initialized it to a constant, positive value). It wasn't consistent from run-to-run. It might work from a cold-boot, but then the exact same ELF would not work if I ran it after something else.

It didn't seem to be caused by a race condition (if it worked/didn't work, it would consistently work/not work until you made a change to the code, but after you made that change, the original binary would start behaving differently) so I figured it had to involve uninitialized memory being involved somehow. I kept checking over and over that I was initializing everything, even disassembling my code to make sure the compiler wasn't miscompiling my initialization code. I eventually added logging around the semaphore initialization. When that didn't help, I added even more paranoid logging. It looked like this:

Code: Select all

npvr_scene_buffer_t * npvr_sb_create(const npvr_pass_t *passes, int pass_count, int width_tiles, int height_tiles) {
	int p, l;
	
	assert(width_tiles <= 40);
	assert(width_tiles > 0);
	assert(height_tiles <= 15);
	assert(height_tiles > 0);
	assert(pass_count > 0);
	assert(passes != NULL);
	
	npvr_scene_buffer_t * sb = malloc(sizeof(npvr_scene_buffer_t));
	npvr_pass_internal_t * intpasses = calloc(pass_count, sizeof(npvr_pass_internal_t));
	
	assert(sb != NULL);
	assert(intpasses != NULL);
	
	NPVR_LOG("npvr_sb_create:\n");
	NPVR_LOG("  SB: %p\n", sb);
	
	sb->passes = intpasses;
	sb->pass_count = pass_count;
	sb->width_tiles = width_tiles;
	sb->height_tiles = height_tiles;
	
	NPVR_LOG("  PreBanks available: %i\n", sem_count(&sb->banks_available));
	NPVR_LOG("  PreBanks available: 0x%08x\n", sem_count(&sb->banks_available));
	NPVR_LOG("  PreBanks available init: %i\n", sb->banks_available.initialized);
	errno = 0;
	int ret = sem_init(&sb->banks_available, 2);
	int semerrno = errno;
	if (ret)
		NPVR_LOG("  ****sem_init ERROR****\n");
	NPVR_LOG("  PostErrno: %i\n", semerrno);
	NPVR_LOG("  PostBanks available: %i\n", sem_count(&sb->banks_available));
	NPVR_LOG("  PostBanks available: 0x%08x\n", sem_count(&sb->banks_available));
	NPVR_LOG("  Banks available init: %i\n", sb->banks_available.initialized);
	
	//The rest is unimportant for this
After I managed to convince it to not work, I got this from the logs when making three consecutive calls to npvr_sb_create:

Code: Select all

npvr_sb_create:
  SB: 0x8c12f4a0
  PreBanks available: 757101421
  PreBanks available: 0x2d20736d
  PreBanks available init: 540618798
  PostErrno: 0
  PostBanks available: 2
  PostBanks available: 0x00000002
  Banks available init: 1
npvr_sb_create:
  SB: 0x8c12f528
  PreBanks available: -42008063
  PreBanks available: 0xfd7f0201
  PreBanks available init: 33292289
  ****sem_init ERROR****
  PostErrno: 22
  PostBanks available: -42008063
  PostBanks available: 0xfd7f0201
  Banks available init: 33292289
npvr_sb_create:
  SB: 0x8c12f5b0
  PreBanks available: 537539104
  PreBanks available: 0x200a3220
  PreBanks available init: 979725410
  PostErrno: 0
  PostBanks available: 2
  PostBanks available: 0x00000002
  Banks available init: 1
Hey, all the previous times the semaphore count was bad, it happened to be a random negative value. The only time sem_init failed, the uninitialized memory happened to contain a negative value...

Code: Select all

int sem_init(semaphore_t *sm, int count) {
    if(sm->count < 0) {
        errno = EINVAL;
        return -1;
    }

    sm->count = count;
    sm->initialized = 1;
    return 0;
}
Yep. Someone had a thinko or made a mistake doing a copy-and-paste. The second line is clearly meant to error check the count value passed in, but accidentally checks garbage in the uninitialized semaphore. So

Code: Select all

    if(sm->count < 0) {
should be

Code: Select all

    if(count < 0) {
There's never a problem with a semaphores in the BSS section or statically initialized, but any stack or heap semaphores would randomly fail. The deprecated sem_create does not have this bug, and correctly error checks the initial count value.

If you want to be extra careful, it might be worth setting sm->initialized to zero, to make it more clear that the semaphore was not initialized correctly to people who don't check the return value of sem_init:

Code: Select all

int sem_init(semaphore_t *sm, int count) {
    if(count < 0) {
        errno = EINVAL;
        sm->initialized = 0;    /* Mark semaphore as invalid! */
        return -1;
    }

    sm->count = count;
    sm->initialized = 1;
    return 0;
}
These users thanked the author TapamN for the post:
Ian Robinson
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Bug in sem_init

Post by BlueCrab »

Welp, that would be a bug I wrote. Thanks for finding it, I'll patch it as soon as I get a chance.

That said, usually things such as that that initialize structures don't modify them if they fail. That's why it doesn't do that now... That said, it probably isn't a bad idea to do so either.
User avatar
BlueCrab
The Crabby Overlord
The Crabby Overlord
Posts: 5652
Joined: Mon May 27, 2002 11:31 am
Location: Sailing the Skies of Arcadia
Has thanked: 9 times
Been thanked: 69 times
Contact:

Re: Bug in sem_init

Post by BlueCrab »

Change committed in commit a2964c.
These users thanked the author BlueCrab for the post:
Ian Robinson
Post Reply