While this provides perfect ordering information, it does so at the cost of bouncing a cacheline (the head of the queue) around the entire machine. That's fine if perfect ordering information is really, really important, but if a lack of probe effect is more important, try the DTrace approach: each thread writes into a private buffer, with a best-effort timestamp (say, the CPU's TSC). Drains merge-sort all the thread's buffers by timestamp.
I don't get it. At the beginning of the article he talks about the dangers of locking IPC (inter-process communication) calls, then to prevent this, he describes a lock-free mechanism using the CAS primitive, which can be used only in-process, not between processess.<p>The TransferString function he proposes seems overly complex to me, using locks would make it more simple and even faster. It would almost make the code look like "lock(); memcpy(); unlock();" which is not prone to deadlocking.<p>I even downloaded the source code to the article, but he uses threads to test it. Anyone care to explain this thing to me?
I did something similar at Arbor; single producer, multiple consumers, high-volume message buffer (individual TCP connections off a monitored ISP core network). When I got in the door, it was SYSV semaphores. Don't ever use SYSV IPC. We needed an event loop, so we could do fine-grained timers. Instead of using locks, we did a distributed commit scheme (using an atomic increment), just as if we were synchronizing over the network.