Interesting!<p>I had known that thread-local variables can be access pretty fast, via dedicated segment register, but I was not clear how can one make this work for dynamically loaded PIC code, like most .so files.<p>Turns out you can't. You only get fast access via dedicated register if you are using variable declared in the main program. The .so files have to call special function which does multiple memory lookups to get the actual location, probably severely reducing performance.<p>(and this is another case when seemingly simple operation -- getting variable value -- gets internally translated to dozens of operations and a function call)