I believe the author may have re-invented RCU: <a href="https://en.wikipedia.org/wiki/Read-copy-update" rel="nofollow">https://en.wikipedia.org/wiki/Read-copy-update</a><p>This code follows the basic structure of RCU. The traditional RCU primitives correspond to DataProtector primitives like so:<p><pre><code> - prot.use() is rcu_read_lock()
- prot.unUse() is rcu_read_unlock()
- prot.scan() is synchronize_rcu()
</code></pre>
RCU was originally a kernel-side concept, but it has since expanded to user-space also, see:<p><a href="http://liburcu.org/" rel="nofollow">http://liburcu.org/</a><p><a href="https://lwn.net/Articles/573424/" rel="nofollow">https://lwn.net/Articles/573424/</a><p>liburcu is clearly not as simple as what is described in the article. One reason is that they are using pre-C11 C, which has no standardized atomics or barriers. It is also much more aggressive about reducing overheads: this article's code incurs a test plus a (possibly contended) atomic increment/decrement pair for every read critical section. liburcu offers several variants of the implementation, but none of them are as expensive as this.<p>I said "possibly contended" even though the article's code endeavors not to have contention. But it appears it still can run into contention in the case where there are more threads than the compile-time template parameter of the DataProtector class. In that case the id space will wrap around and multiple threads will be assigned the same slot, leading to contention. The more you exceed this, the more contention you will get. This is an unfortunate drawback of this code's simplicity.<p>Also, it seems like there is a bug in the code. _mySlot is thread-local, but _last is an instance variable. It seems like two DataProtectors (which will have independent _last values) could assign the same _mySlot to different threads. This would cause unnecessary contention even if you don't exceed the thread limit. It seems like _last should be static (global).<p>I don't mean this commentary to come off negatively. I think there is a <i>lot</i> of value in the way they have managed to factor this so that the DataProtector class is short and simple. Lots of lock-free algorithms have been known for a while, but in many cases they aren't practical because they aren't factored in such a way that they have convenient APIs. SMR/Hazard Pointers is a great example of this. I would love to see improvements to this class to address these problems while still remaining simple.
A nice simplification of would be to use the current CPU number as your ID. That eliminates the dependence on thread-local storage, and with high probability avoids issues where there are collisions between threads whose IDs modulo N are equivalent.<p>You could use an instruction like `RDTSC` to extract the CPU number. There might also be ways of getting at it efficiently with glibc/pthreads.