Building a lock free continuous ring buffer in Rust

200 点作者 Argorak将近 6 年前

9 条评论

Looking at this blog-post more carefully... I'm not convinced that this ring buffer is actually correct unfortunately.> buffer.write.store(buffer.write.load() + write_len)This is... an atomic load, followed by an add, followed atomic store.A TRUE lock-free queue would be buffer.write.AtomicAdd(write_len), (which is a singular, atomic add).There are many other issues here, but this is the most egregious issue I was able to find. The whole thing doesn't work, its completely non-safe and incorrect from a concurrency point of view. Once this particular race condition is solved, there's at least 3 or 4 others that I was able to find that also need to be solved.EDIT: Here's my counter-example<pre><code> write = 100 (at the start) write_len = 10 for both threads. |-----------------------------------| | Thread 1 | Thread 2 | |-----------------------------------| | write.load (100)| write.load (100)| | 100+write_len | | | write.store(110)| 100+write_len | | | write.store(110)| |-----------------------------------| write = 110 after the two threads "added" 10 bytes each </code></pre> Two items of size 10 were written to the queue, but only +10 bytes happened to the queue. The implementation is completely busted and broken. Just because you're using atomics doesn't mean that you've created an atomic transaction.It takes great effort and study to actually build atomic transactions out of atomic parts. I would argue that this thread should be a lesson in how easy it is to get multithreaded programming dead wrong.---------For a talk that actually gets these details right... I have made a recent submission: <a href="https://news.ycombinator.com/item?id=20096907" rel="nofollow">https://news.ycombinator.com/item?id=20096907</a> . Mr. Pikus breaks down how atomics and lock-free programming needs to be done. It takes him roughly 3.5 hours to describe a lock-free concurrent queue. He's not messing around either: its a dense talk on a difficult subject.Yeah, its not easy. But this is NOT an easy subject by any stretch of the imagination.

评论 #20098405 未加载

评论 #20098519 未加载

评论 #20098446 未加载

评论 #20101325 未加载

ohazi将近 6 年前

Although dragontamer's complaints are mostly correct, I think he's being a little uncharitable and is largely missing the point.Yes, writing lock-free code that's generic, that supports many-to-many reads/writes, and that's correct on all architectures is hilariously hard, and most people should not try to implement these from scratch at work. Other approaches can be more performant, and can have acceptable trade-offs for most use cases.Are there issues with this one? Maybe. As dragontamer stated multiple times, this stuff is pretty hard to reason about.HOWEVER, this ring buffer was designed to run on embedded systems. As usual, the constraints are often a little bit different when you're writing software for a microcontroller.As an example, let's imagine you have periodic bursts of data coming in on a serial port at 2 Mbps, and your microcontroller is running at 32 MHz. Also, your serial port only has a 4 byte hardware buffer, and the cost of missing a byte is... I don't know, sending an endmill through the wall.You can absolutely make this work, and you can even formally verify that you'll always meet timing, but you're going to have a really hard time doing this if you also have to analyze every other piece of code that will ever try to acquire that lock. A single-producer, single-consumer, lock-free queue with fixed message sizes can be implemented correctly in about 20 lines of C. It has a lot of annoying constraints, and you still have to use atomics correctly, and you still need a memory barrier, and don't even think about resetting the queue from either thread, and... (etc).But if you're careful, a queue like this can make an otherwise impossible task relatively manageable. I can no longer count the number of times a construct like this has saved a project I was involved with.Would it automatically run correctly and at full performance on a Xeon running Linux? Fuck no. But that's not the point.The desirable quality of lock-free queues for embedded systems is correctness, not performance.

评论 #20102355 未加载

0xffff2将近 6 年前

>This is the story of how Andrea Lattuada (PhD student at ETH Zurich) and James Munns (from Ferrous Systems) designed and implemented (two versions!) of an high-perf lock-free ring-buffer for cross-thread communication. If any of those words look scary to you, don't fret, we'll explain everything from the basics.This is not the message I want to see introducing such a complex topic. Writing correct lock-free code is incredibly hard, and you're not going to get it right or even understand it from a single post. I've done a bit of reading on this subject, and I'm not even sure that it's possible to write a correct lock-free queue (of which this is a variant) in a non-garbage collected language without some pretty substantial trade-offs regarding memory management.

评论 #20098499 未加载

评论 #20099240 未加载

评论 #20102181 未加载

评论 #20098486 未加载

评论 #20098541 未加载

kazinator将近 6 年前

I implemented such a buffer in 2006. It was used for fast message passing from user space to the kernel. When a message was too large to fit at the end of the circular buffer, a zero-length message was placed into the remaining space; that indicated "wrap to the beginning". It's so obvious, it must have been implemented numerous times by others before me.If I were to implement the same thing again, I would just split the messages and use scattered writes (struct iovec) to process them. I cannot remember exactly, but I think the only reason the messages were linear was just for the kernel thread to be able to write them out in a single call.Oh wait, now I remember; I think the buffers were linear due to the production side: sprintf being used to produce some of the message payloads, and that requiring a linear buffer. But of course it's possible to support both wrapped messages and the "tail space not used" indicator.

评论 #20098701 未加载

评论 #20098706 未加载

评论 #20098114 未加载

CUViper将近 6 年前

You can make wrapping writes look contiguous by mapping the same memory twice, one after the other. This is done in slice-deque such that the entire buffer can be viewed as a contiguous slice, regardless of the start/end positions.<a href="https://crates.io/crates/slice-deque" rel="nofollow">https://crates.io/crates/slice-deque</a>

评论 #20098235 未加载

评论 #20102309 未加载

banachtarski将近 6 年前

Also important, not using a power of two sized ring is leaving lots of optimization on the table.

评论 #20098091 未加载

_Codemonkeyism将近 6 年前

Reminds me of the LMAX architecture.Slides of a talk I gave about LMAX<a href="https://www.slideshare.net/Stephan.Schmidt/lmax-architecture-jax-conference" rel="nofollow">https://www.slideshare.net/Stephan.Schmidt/lmax-architecture...</a>

person_of_color将近 6 年前

What is considered the bible on lock free programming?

评论 #20101570 未加载

评论 #20103976 未加载

amelius将近 6 年前

I'm guessing this still uses locks, but at the CPU/cache level instead of inside the algorithm.

评论 #20101163 未加载