https://github.com/KallistiOS/KallistiOS/pull/111
This is for when you want to give each thread its own individual copy of some sort of data. I see it used frequently for things like giving each thread its own unique error code or execution context, giving them each their own local version of some buffer, giving them each their own copy of some sort of custom allocator, or even giving them each a copy of a Lua thread or some other scripting state. This is for both C or C++.
Using it is also extremely simple and convenient. You just use the thread_local specifier in C++11 and beyond, _Thread_local specifier in C99 and beyond, or __thread if you're using a dinosaur revision of either language that belongs in a museum:
Code: Select all
static thread_local uint32_t per_thread_counter = 0;
static thread_local std::string per_thread_error = "success";
static thread_local uint8_t per_thread_buffer[256] = { 0 };
The way compiler-level TLS is handled is extremely fast, especially if everything is built statically like we do so all sizes can be calculated up front and stored within the ELF file.
What winds up happening is just that to access each thread's copy of the variable, the compiler emits reads/writes to just compile-time offsets from the GBR register for each variable's address. GBR is our "thread pointer" which will now point to the current thread's statically allocated TLS data, which is all allocated up front just once as a single block whenever a thread gets created.
I'm not sure how much concurrency everyone is doing, but hopefully if you're doing some advanced things or find yourself using any sort of data pattern where a structure or variable is duplicated so that each thread gets its own copy, you will be able to leverage this for much better performance and ease-of-use.