Nice work! I have some suggestions for future improvements:<p>- Your CPUCycle class doesn't actually do backoff. atomic_thread_fence(m_o_relaxed) is a no-op, and at least one compiler (recent clang versions) will optimize the loop away entirely. You can fix by reading from a volatile variable inside the loop.<p>- Similarly, nanosleep on linux will busy-loop instead of returning to the scheduler for small delays; this seems not to be the goal for some of the backoff schemes<p>- The BoundedQueue assumes that a futex call functions as a "full memory fence". I'm not terribly familiar with the linux kernel internals, but I don't believe this to be the case. On Power, for instance, it appears to not always generate a sync instruction, instead using lwsync or isync. I haven't looked at the queue closely enough to tell if this poses a correctness issue, but it definitely looks weaker than atomic_thread_fence(m_o_seq_cst).<p>- The SpinLock uses a test-and-set loop; test-and-test-and-set will often scale much better because of the reduced cache coherency traffic required.<p>- Similarly, using NoBackoff as the default backoff strategy seems to impose a likely performance penalty on users who aren't familiar with the need for backoff.<p>- The FutexLock code looks a little screwy to me. Why backoff if you're likely going to do a futex operation anyway? Why is the Lock()'s unconditional discarding of the exchanged value safe? It seems like you have the potential for missed wakeups. But it's late and I'm tired, so I'm probably just missing something.<p>I don't mean to be critical, it's just that code reviews only ever contain nits. I'm excited that somebody is working on locking primitives that are more efficient than glibc pthreads (whose performance I'm not a huge fan of).
One option for compatibility on non-futex platforms is to write your own futex emulation using striped std::mutex or pthread_mutex_t. The emulated futex API still has the same slow-path-only goodness as a real futex, but is fully portable.<p>For an example of this technique, check out <a href="https://github.com/facebook/folly/blob/master/folly/detail/Futex.cpp#L133" rel="nofollow">https://github.com/facebook/folly/blob/master/folly/detail/F...</a> . That code only implements wake and wait, but it would be straightforward to extend it to the other futex operations.
for those interested in C++ based concurrency lib, check out lthread_cpp. It's a wrapper around lthread.<p><a href="http://lthread-cpp.readthedocs.org" rel="nofollow">http://lthread-cpp.readthedocs.org</a>