A conceptually simpler way to do this is to assign one (or more) extra bits in the head an tail pointers. For example, for a 512 entry ring, use 16-bit indices. Whenever you index the ring, and the index with 511 before performing the index.<p><pre><code> Write to ring: ring[511&(head++)] = data
Read from ring: data = ring[511&(tail++)]
Ring is empty: head == tail
Ring is full: tail + 512 == head</code></pre>
This post is literally about saving one byte, at the cost of being slower and not producer-consumer friendly. Not very interesting.<p>The flag could have been hidden in any other field as a bit or something. Then it could be at least masked with simple AND operation which is usually faster than branching, especially on pipelined CPUs.<p>Update: Quick implementation: <a href="https://gist.github.com/dpc/a194b7784adfa150a450" rel="nofollow">https://gist.github.com/dpc/a194b7784adfa150a450</a><p>This fix for concurrency issue is an ugly hack. I'm not sure if it's even correct in this particular scenario, and definitely not proper for anything that would aspire to be good reusable code. I'd advise this code to push atomicity requirement onto caller. Irqs should have been disabled by calling code.<p>"register" keyword is obsolete. There's no point in using it.
I did something similar back in the DOS era for a serial port library: <a href="https://github.com/kstenerud/DOS-Serial-Library/blob/master/serial.c" rel="nofollow">https://github.com/kstenerud/DOS-Serial-Library/blob/master/...</a>