I guess it comes down to the misconception mentioned in the follow-up article:<p>> For short critical sections, spinlocks perform better<p>While (system) mutexes have some overhead, the userland to OS overhead typically becomes <i>evident</i> only when benchmarking code that guards small critical sections.<p>The knee jerk reaction is then: let's stay in userland and use a simple spinlock and move on. The critical section is so small that it's actually hard to benchmark and show the alleged improvement. Which it turns out often isn't an improvement at all, just a premature optimization.