Interesting!<p>I had known that thread-local variables can be access pretty fast, via dedicated segment register, but I was not clear how can one make this work for dynamically loaded PIC code, like most .so files.<p>Turns out you can't. You only get fast access via dedicated register if you are using variable declared in the main program. The .so files have to call special function which does multiple memory lookups to get the actual location, probably severely reducing performance.<p>(and this is another case when seemingly simple operation -- getting variable value -- gets internally translated to dozens of operations and a function call)
<a href="https://www.akkadia.org/drepper/tls.pdf" rel="nofollow">https://www.akkadia.org/drepper/tls.pdf</a> is also a great write-up
I have done plenty with threads but never used thread-local storage except when forced to by some other library using it. To me it seems like a bolted-on monstrosity that provides thread safety to thread-naive code after the fact. Am I missing something? Is this a good solution to some problem I haven't encountered? Are there situations where the performance of TLS is better than some other solution?