Adventures in message queues

350 pointsby fcambusabout 10 years ago

24 comments

ChuckMcMabout 10 years ago

In my experience there are three things that will break here;1) At-most-once is a bridge to an elementary school which has an inter-dimensional connection to a universe filled with pit vipers. Kids will die, and there is nothing you can do to stop it.2) Messages are removed when acknowledged or memory pressure forces them to be kicked out. Black Perl messages, those that sail in from out of nowhere, and lonely widows (processes that never find out their loved ones are dead) will result.3) Messages are ordered using wall clock millisecond time. This will leave your messages struggling to find their place in line and messages that should be dead, not be dead (missing fragment problem).Obviously all these are simply probabilistic trade-offs based on most likely scenarios which result in arbitrarily small windows of vulnerability. No window is small enough at scale over time.Often when these things have bitten me it has been non-programming stuff. For example a clock that wouldn't follow NTP because it was too far ahead of what NTP thought the time was, an operator fixed that by turning time back 8 seconds. A client library that was told messages arrive at most one time, and so made a file deletion call on the arrival of a message, a restored node holding that message managed to shoot it out before the operator could tell it that it was coming back from a crash, poof damaged file. And one of my favorites in ordering, a system that rebooted after an initial crash (resetting its sequence count) and getting messages back into flight with the wrong sequence number but with legitimate sequence values. FWIW, these sorts of things are especially challenging for distributed storage systems because files are, at their most abstract, little finite state machines that walk through a very specific sequence of mutations the order of which is critical for correct operation.My advice for folks building such systems are never depend on the 'time', always assume at-least-once, and build in-band error detection and correction to allow for computing the correct result from message stream 'n' where two or more invariants in your message protocol have been violated.Good luck!

评论 #9210291 未加载

评论 #9209530 未加载

评论 #9209522 未加载

评论 #9210275 未加载

antirezabout 10 years ago

I'm very sorry, credits for the questions goes to Jacques Chester, see <a href="https://news.ycombinator.com/item?id=8709146" rel="nofollow">https://news.ycombinator.com/item?id=8709146</a> I made an error cut&pasting the wrong name of Adrian (Hi Adrian, sorry for misquoting you!). Never blog and go to bed I guess, your post may magically be top news on HN...

pixelmonkeyabout 10 years ago

Seems like a similar design to Apache Kafka, <a href="http://kafka.apache.org" rel="nofollow">http://kafka.apache.org</a>. AP, partial ordering (Kafka does ordering within "partitions", but not topics).One difference is that Disque "garbage collects" data once delivery semantics are achieved (client acks) whereas Kafka holds onto all messages within an SLA/TTL, allowing reprocessing. Disque tries to handle at-most-once in the server whereas Kafka leaves it to the client.Will be good to have some fresh ideas in this space, I think. A Redis approach to message queues will be interesting because the speed and client library support is bound to be pretty good.

andrea_sabout 10 years ago

Maybe I'm missing something, but if it is important to guarantee that a certain message will be dispatched and processed by a worker, why wouldn't a RDBMS with appropriate transactional logic be the best solution?

评论 #9210510 未加载

acolyerabout 10 years ago

Credit for the questions is due to jacques_chester, not me! See <a href="https://news.ycombinator.com/item?id=8709146" rel="nofollow">https://news.ycombinator.com/item?id=8709146</a>

turingbookabout 10 years ago

>a few months ago I saw a comment in Hacker News, written by Adrian Colyer...was commenting how different messaging systems have very different set of features, properties, and without the details it is almost impossible to evaluate the different choices, and to evaluate if one is faster than the other because it has a better implementation, or simple offers a lot less guarantees. So he wrote a set of questions one should ask when evaluating a messaging system.I can not find the comment by @acolyer on HN. Who can help me?

评论 #9209943 未加载

cafabout 10 years ago

I wonder what the point is in having "best effort FIFO"? If the application has to be able to deal with unordered messages anyway, you might as well not bother to try to maintain any kind of order.It's as well to be hung for a sheep as for a lamb.

评论 #9209748 未加载

评论 #9208913 未加载

mappuabout 10 years ago

Ask HN: I'm in the market for a distributed message queue, for scheduling background tasks -Does anything support "regional" priorities, where jobs are popped based on a combination of age + geographic/latency factors?Also, what are recommended solutions for distributing job injection? My job injection is basically solely dependent on time, and so i envisage one node (raft consensus?) determining all jobs to inject into the queue.My queue volume is about 50 items/sec and nodes will be up to 400ms apart.

评论 #9210225 未加载

isbabout 10 years ago

This looks very cool. At-least once semantics are the way to go because most tasks require idempotence anyway and that helps in dealing with multiple delivery. Strict FIFO ordering is not always needed either as long as you avoid starvation - most of the time you need "reliable" deferred execution ("durable threads").I started prototyping something along these lines on top of riak (it is incomplete - missing leases etc but that should be straightforward to add): <a href="https://github.com/isbo/carousel" rel="nofollow">https://github.com/isbo/carousel</a> It is a durable and loosely FIFO queue. It is AP because of Riak+CRDTs. It is a proof of concept - would be nice to build it on top of riak_core instead of as a client library.

jtchangabout 10 years ago

When I first installed Redis years ago I was astounded at how easy it was to get up and running. Compare this to the plethora of message brokers out there: the vast majority you will spend the better half of the day trying to figure out how to configure the damn thing.My overall impressions with message brokers is that RabbitMQ is a pain in the ass to setup, celery is my go to these days with beanstalkd being a close second if I don't want too many of celery's features.

评论 #9209833 未加载

评论 #9211284 未加载

评论 #9209671 未加载

评论 #9209452 未加载

评论 #9209991 未加载

sylvinusabout 10 years ago

FYI, Salvatore will speak at dotScale in Paris about Disque on June 8: <a href="http://dotscale.io" rel="nofollow">http://dotscale.io</a>

rdohertyabout 10 years ago

This has me excited for many reasons. Redis is amazingly powerful, robust and reliable piece of technology. Also I love reading antirez's blog posts about the decisions behind Redis so I can't wait to learn more about queueing systems from him when discussing Disque.

评论 #9208829 未加载

bcg1about 10 years ago

This looks like a good effort, congratulations.Personally I'm torn on the usefulness of generic brokers for all circumstances... there are obvious advantages, but at the same time every messaging problem scales and evolves differently so a broker can quickly become just one more tail trying to wag the dog.I am also interested in the architecture of tools like ZeroMQ and nanomsg, where they provide messaging "primitives" and patterns that can easily be used to compose larger systems, including having central brokers if that floats your boat.

jraedischabout 10 years ago

We recently switched from RabbitMQ to Redis queuing because we were not able to implement a well enough priority queue with highly irregular workloads. Prefetch would not work since 2 minute workloads would block all following messages. Timeout queues would somewhat rebalance msgs, but large blocks of messages would be queued at the same time and therefor be processed as large blocks. Now our workers are listening to 10 queues/lists with different priorities with BRPOP and so far everything seems to work.

latchabout 10 years ago

Unodered, in-memory queues shouldn't be anyone's goto solution. I think there's a time and place for these, and having at-least-once delivery is a huge win over just using Redis, so I'm excited.Still, unless you know exactly what you're doing, you should pick a something with strong ordering guarantees and that won't reject messages under memory pressure (although, rejecting new messages under memory pressure is A LOT easier/better to handle than dropping old messages).

jpfrabout 10 years ago

Some big project are currently making the switch to DDS-based pub/sub. [1,2]Now that everybody is making QoS guarantees in pub/sub and message queues, is there a real difference to the 10 year old tech deployed in boats, trains and tanks?[1] <a href="http://www.omg.org/spec/DDS/1.2/" rel="nofollow">http://www.omg.org/spec/DDS/1.2/</a>[2] <a href="http://design.ros2.org/articles/ros_on_dds.html" rel="nofollow">http://design.ros2.org/articles/ros_on_dds.html</a>

评论 #9210879 未加载

arunodaabout 10 years ago

I think this has a lot of roots from NSQ. But, NSQ has no replication support.I think built in replication is very nice to have. Would like to try once this arrives.

评论 #9209558 未加载

jacques_chesterabout 10 years ago

I wish to assure all and sundry that Adrian Colyer is not my secret crime-fighting identity, and vice versa :)

评论 #9210302 未加载

评论 #9210172 未加载

Lx1oG-AWb6h_ZG0about 10 years ago

Will there be any way to set up machine affinity? I think Azure Service Bus uses this mechanism (by specifying a partition key for a message) to enable strict FIFO for a given partition.

andrewstuartabout 10 years ago

I didn't see any mention of dead letter queues. Does it support dead letters? This is an extremely useful feature of Amazon SQS.

X-Istenceabout 10 years ago

This reminds me of a talk at SCALE13x about NATS: <a href="http://nats.io" rel="nofollow">http://nats.io</a>It's fast and scaleable.

评论 #9209988 未加载

aaa667about 10 years ago

Is guaranteed at-most-once delivery impossible?

评论 #9210322 未加载

评论 #9212292 未加载

JohnLenabout 10 years ago

How does zero MQ stand up

评论 #9209813 未加载

andrewstuartabout 10 years ago

It would be nice to have a message queue system not built in Erlang or Java.

评论 #9214439 未加载

评论 #9209774 未加载

评论 #9209662 未加载

24 comments

ChuckMcMabout 10 years ago

评论 #9210291 未加载

评论 #9209530 未加载

评论 #9209522 未加载

评论 #9210275 未加载

antirezabout 10 years ago

pixelmonkeyabout 10 years ago

andrea_sabout 10 years ago

评论 #9210510 未加载

acolyerabout 10 years ago

Credit for the questions is due to jacques_chester, not me! See <a href="https://news.ycombinator.com/item?id=8709146" rel="nofollow">https://news.ycombinator.com/item?id=8709146</a>

turingbookabout 10 years ago

评论 #9209943 未加载

cafabout 10 years ago

评论 #9209748 未加载

评论 #9208913 未加载

mappuabout 10 years ago

评论 #9210225 未加载

isbabout 10 years ago

jtchangabout 10 years ago

评论 #9209833 未加载

评论 #9211284 未加载

评论 #9209671 未加载

评论 #9209452 未加载

评论 #9209991 未加载

sylvinusabout 10 years ago

FYI, Salvatore will speak at dotScale in Paris about Disque on June 8: <a href="http://dotscale.io" rel="nofollow">http://dotscale.io</a>

rdohertyabout 10 years ago

评论 #9208829 未加载

bcg1about 10 years ago

jraedischabout 10 years ago

latchabout 10 years ago

jpfrabout 10 years ago

评论 #9210879 未加载

arunodaabout 10 years ago

I think this has a lot of roots from NSQ. But, NSQ has no replication support.I think built in replication is very nice to have. Would like to try once this arrives.

评论 #9209558 未加载

jacques_chesterabout 10 years ago

I wish to assure all and sundry that Adrian Colyer is not my secret crime-fighting identity, and vice versa :)

评论 #9210302 未加载

评论 #9210172 未加载

Lx1oG-AWb6h_ZG0about 10 years ago

Will there be any way to set up machine affinity? I think Azure Service Bus uses this mechanism (by specifying a partition key for a message) to enable strict FIFO for a given partition.

andrewstuartabout 10 years ago

I didn't see any mention of dead letter queues. Does it support dead letters? This is an extremely useful feature of Amazon SQS.

X-Istenceabout 10 years ago

This reminds me of a talk at SCALE13x about NATS: <a href="http://nats.io" rel="nofollow">http://nats.io</a>It's fast and scaleable.