What If We Could Rebuild Kafka from Scratch?

254 pointsby mpweiher20 days ago

35 comments

Agreed. The head of line problem is worth solving for certain use cases.But today, all streaming systems (or workarounds) with per message key acknowledgements incur O(n^2) costs in either computation, bandwidth, or storage per n messages. This applies to Pulsar for example, which is often used for this feature.Now, now, this degenerate time/space complexity might not show up every day, but when it does, you’re toast, and you have to wait it out.My colleagues and I have studied this problem in depth for years, and our conclusion is that a fundamental architectural change is needed to support scalable per message key acknowledgements. Furthermore, the architecture will fundamentally require a sorted index, meaning that any such a queuing / streaming system will process n messages in O (n log n).We’ve wanted to blog about this for a while, but never found the time. I hope this comment helps out if you’re thinking of relying on per message key acknowledgments; you should expect sporadic outages / delays.

评论 #43794329 未加载

评论 #43809083 未加载

评论 #43793790 未加载

vim-guru20 days ago

<a href="https://nats.io" rel="nofollow">https://nats.io</a> is easier to use than Kafka and already solves several of the points in this post I believe, like removing partitions, supporting key-based streams, and having flexible topic hierarchies.

评论 #43798197 未加载

评论 #43791276 未加载

评论 #43791662 未加载

评论 #43793910 未加载

评论 #43794997 未加载

评论 #43791574 未加载

Ozzie_osman20 days ago

I feel like everyone's journey with Kafka ends up being pretty similar. Initially, you think "oh, an append-only log that can scale, brilliant and simple" then you try it out and realize it is far, far, from being simple.

评论 #43791781 未加载

评论 #43791104 未加载

评论 #43791469 未加载

评论 #43791376 未加载

评论 #43793720 未加载

评论 #43809276 未加载

评论 #43792949 未加载

nitwit00520 days ago

> When producing a record to a topic and then using that record for materializing some derived data view on some downstream data store, there’s no way for the producer to know when it will be able to "see" that downstream update. For certain use cases it would be helpful to be able to guarantee that derived data views have been updated when a produce request gets acknowledged, allowing Kafka to act as a log for a true database with strong read-your-own-writes semantics.Just don't use Kafka.Write to the downstream datastore directly. Then you know your data is committed and you have a database to query.

评论 #43791532 未加载

评论 #43791990 未加载

评论 #43791490 未加载

评论 #43793220 未加载

评论 #43791581 未加载

Spivak20 days ago

Once you start asking to query the log by keys, multi-tenancy trees of topics, synchronous commits-ish, and schemas aren't we just in normal db territory where the kafka log becomes the query log. I think you need to go backwards and be like what is the feature a rdbms/nosql db can't do and go from there. Because the wishlist is looking like CQRS with the front queue being durable but events removed once persisted in the backing db where the clients query events from the db.The backing db in this wishlist would be something in the vein of Aurora to achieve the storage compute split.

peanut-walrus20 days ago

Object storage for Kafka? Wouldn't this 10x the latency and cost?I feel like Kafka is a victim of it's own success, it's excellent for what it was designed, but since the design is simple and elegant, people have been using it for all sorts of things for which it was not designed. And well, of course it's not perfect for these use cases.

评论 #43792319 未加载

评论 #43791320 未加载

评论 #43794445 未加载

评论 #43794651 未加载

评论 #43791239 未加载

评论 #43791862 未加载

debadyutirc20 days ago

This is a question we asked 6 years ago.What if we wrote it in Rust. And leveraged and WASM.We have been at it for the past 6 years. <a href="https://github.com/infinyon/fluvio">https://github.com/infinyon/fluvio</a>For the past 2 years we have also been building Flink using Rust and WASM. <a href="https://github.com/infinyon/stateful-dataflow-examples/">https://github.com/infinyon/stateful-dataflow-examples/</a>

评论 #43798034 未加载

评论 #43799276 未加载

fintler20 days ago

Keep an eye out for Northguard. It's the name of LinkedIn's rewrite of Kafka that was announced at a stream processing meetup about a week ago.

评论 #43795990 未加载

supermatt20 days ago

> "Do away with partitions"> "Key-level streams (... of events)"When you are leaning on the storage backend for physical partitioning (as per the cloud example, where they would literally partition based on keys), doesnt this effectively just boil down to renaming partitions to keys, and keys to events?

评论 #43793292 未加载

olavgg20 days ago

How many of the Apache Kafka issues are adressed by switching to Apache Pulsar?I skipped learning Kafka, and jumped right into Pulsar. It works great for our use case. No complaints. But I wonder why so few use it?

评论 #43798325 未加载

评论 #43794055 未加载

评论 #43793877 未加载

vermon20 days ago

Interesting, if partitioning is not a useful concept of Kafka, what are some of the better alternatives for controlling consumer concurrency?

评论 #43793523 未加载

frklem20 days ago

"Faced with such a marked defensive negative attitude on the part of a biased culture, men who have knowledge of technical objects and appreciate their significance try to justify their judgment by giving to the technical object the only status that today has any stability apart from that granted to aesthetic objects, the status of something sacred. This, of course, gives rise to an intemperate technicism that is nothing other than idolatry of the machine and, through such idolatry, by way of identification, it leads to a technocratic yearning for unconditional power. The desire for power confirms the machine as a way to supremacy and makes of it the modern philtre (love-potion)." Gilbert Simondon, On the mode of existence of technical objects.This is exactly what I interpret from these kind of articles: engineering just for the cause of engineering. I am not saying we should not investigate on how to improve our engineered artifacts, or that we should not improve them. But I see a generalized lack of reflection on why we should do it, and I think it is related to a detachment from the domains we create software for. The article suggests uses of the technology that come from so different ways of using it, that it looses coherence as a technical item.

评论 #43792344 未加载

redditor9865418 days ago

I agree on the head of the line blocking problem and that not everyone needs the per partition ordering. For that I have started to use SQS FIFO with the message grouping key being the logical key for the event/resource. This gives me ordering within the key and not extra ordering across keys. So I don’t have the head of line blocking problem.If I need multiple independent consumers, I just instead publish to SNS FIFO and let my consumers create their own SQS fifo queues that are subscribed to the topic. The ordering is maintained across SNS and SQS. I also get native DLQ support for poison pills and an SQS consumer is dead simple to operate vs a Kafka consumer.It does not solve all of the mentioned problems like being able to see what the keys are in the queue or lookup by a given key but as a messaging solution that offers ordering for a key, this is hard to beat.

评论 #43832800 未加载

elvircrn20 days ago

Surprised there's no mention of Redpanda here.

评论 #43791505 未加载

评论 #43793888 未加载

smittywerben17 days ago

I don't understand how everyone hates Kafka I use it as a typed write-ahead JSON log with library support for most languages. Yes the systems I've built with this were overengineered but it worked and was reliable. I just bought a larger disk instead of using whatever remains of the great battle of the zookeeper. I just assumed the fact it has any integration support with standard RDBMs must be a byproduct of being Java as purely an accident.

selkin20 days ago

This is a useful Gedankenexperiment, but I think the replies suggesting that the conclusion is that we should replace Kafka with something new are quiet about what seems obvious to me:Kafka's biggest strength is the wide and useful ecosystem built on top of it.It is also a weaknesses, as we have to keep some (but not of all) the design decisions we wouldn't have made had we started from scratch today. Or we could drop backwards compatibility, at the cost of having to recreate the ecosystem we already have.

mgaunard20 days ago

I can't count the number of bad message queues and buses I've seen in my career.While it would be useful to just blame Kafka for being bad technology it seems many other people get it wrong, too.

评论 #43795071 未加载

tyingq20 days ago

He mentions Automq right in the opener. And if I follow the link, they pitch it in a way that sounds very "too good to be true".Anyone here have some real world experience with it?

评论 #43811735 未加载

bjornsing20 days ago

> Key-centric access: instead of partition-based access, efficient access and replay of all the messages with one and the same key would be desirable.I’ve been working on a datastore that’s perfect for this [1], but I’m getting very little traction. Does anyone have any ideas why that is? Is my marketing just bad, or is this feature just not very useful after all?1. <a href="https://www.haystackdb.dev/" rel="nofollow">https://www.haystackdb.dev/</a>

评论 #43795875 未加载

评论 #43796214 未加载

评论 #43796667 未加载

评论 #43796362 未加载

YetAnotherNick20 days ago

I wish there is a global file system with node local disks, which has rule driven affinity to nodes for data. We have two extremes, one like EFS or S3 express which doesn't have any affinity to the processing system, and other what Kafka etc is doing where they have tightly integrated logic for this which makes systems more complicated.

评论 #43791481 未加载

lewdwig20 days ago

Ah the siren call of the ground-up rewrite. I didn’t know how deep the assumption of hard disks underpinning everything is baked into its design.But don’t public cloud providers already all have cloud-native event sourcing? If that’s what you need, just use that instead of Kafka.

oulipo20 days ago

There are a few interesting projects to replace Kafka: Redpanda / Pulsar / AutoMQhave some of you some experience with those and able to give pros/cons?

评论 #43798085 未加载

dangoodmanUT20 days ago

I think this is missing a key point about partitions: Write visibility orderingThe problem with guaranteed order is that you have to have some agreed upon counter/clock for ordering, otherwise a slow write from one producer to S3 could result in consumers already passing that offset before it was written, thus the write is effectively lost unless the consumers wind-back.Having partitions means we can assign a dedicated writer for that partition that guarantees that writes are in order. With s3-direct writing, you lose that benefit, even with a timestamp/offset oracle. You'd need some metadata system that can do serializable isolation to guarantee that segments are written (visible to consumers) in the order of their timestamps. Doing that transactional system directly against S3 would be super slow (and you still need either bounded-error clocks, or a timestamp oracle service).

评论 #43809306 未加载

iwontberude20 days ago

We could (and will repeatedly) rebuild Kafka from scratch. Solved the question and I didn’t even need to read the article.

0x44544220 days ago

How about logging the logs so I can shell into the server to search the messages.

评论 #43793892 未加载

评论 #43797550 未加载

bionhoward20 days ago

step 1: don’t use the JVM

评论 #43796327 未加载

评论 #43799031 未加载

评论 #43795318 未加载

Mistletoe20 days ago

I know it’s not what the article is about but I really wish we could rebuild Franz Kafka and hear what he thought about the tech dystopia we are in.>I cannot make you understand. I cannot make anyone understand what is happening inside me. I cannot even explain it to myself. -Franz Kafka, The Metamorphosis

评论 #43791827 未加载

评论 #43791618 未加载

imcritic20 days ago

Since we are dreaming - add ETL there as well!

derefr20 days ago

> You either want to have global ordering of all messages on a given topic, or (more commonly) ordering of all messages with the same key. In contrast, defined ordering of otherwise unrelated messages whose key happens to yield the same partition after hashing isn’t that valuable, so there’s not much point in exposing partitions as a concept to users.The user gets global ordering when1. you-the-MQ assign both messages and partitions stable + unique + order + user-exposed identifiers;2. the user constructs a "globally-collatable ID" from the (perPartitionMsgSequenceNumber, partitionID) tuple;3. the user does a client-side streaming merge-sort of messages received by the partitions, sorting by this collation ID. (Where even in an ACK-on-receive design, messages don't actually get ACKed until they exit the client-side per-partition sort buffer and enter the linearized stream.)The definition of "exposed to users" is a bit interesting here, as you might think you could do this merge-sort on the backend, just exposing a pre-linearized stream to the client.But one of the key points/benefits of Kafka-like systems, under high throughput load (which is their domain of comparative advantage, and so should be assumed to be the deployed use-case), is that you can parallelize consumption cheaply, by just assigning your consumer-workers partitions of the topic to consume.And this still works under global ordering, under some provisos:• your workload can be structured as a map/reduce, and you don't need global ordering for the map step, only the reduce step;• it's not impractical for you to materialize+embed the original intended input collation-ordering into the transform workers' output (because otherwise it will be lost in all but very specific situations.)Plenty of systems fit these constraints, and happily rely on doing this kind of post-linearized map/reduce parallelized Kafka partition consumption.And if you "hide" this on an API level, this parallelization becomes impossible.Note, however, that "on an API level" bit. This is only a problem insofar as your system design is protocol-centered, with the expectation of "cheap, easy" third-party client implementations.If your MQ is not just a backend, but also a fat client SDK library — then you can put the partition-collation into the fat client, and it will still end up being "transparent" to the user. (Save for the user possibly wondering why the client library opens O(K) TCP connections to the broker to consume certain topics under certain configurations.)See also: why Google's Colossus has a fat client SDK library.

eluusive20 days ago

This is basically NATS.io

hardwaresofton20 days ago

See also: Warpstream, which was so good it got acquired by Confluent.Feels like there is another squeeze in that idea if someone “just” took all their docs and replicated the feature set. But maybe that’s what S2 is already aiming at.Wonder how long warpstream docs, marketing materials and useful blogs will stay up.

评论 #43794923 未加载

gitroom20 days ago

honestly, kafka always felt like way more moving parts than my brain wants to track, but at the same time, its kinda impressive how the ecosystem just keeps growing - you think the reason people stick to it is hype, laziness, or just not enough real pain yet to switch?

tezza20 days ago

Now that really would be Kafka-esque

评论 #43798624 未加载

ghuntley20 days ago

I'm going to get downvoted for this, but you can literally rebuild Kafka via AI right now in record time using the steps detailed at <a href="https://ghuntley.com/z80" rel="nofollow">https://ghuntley.com/z80</a>.I'm currently building a full workload scheduler/orchestrator. I'm sick of Kubernetes. The world needs better -> <a href="https://x.com/GeoffreyHuntley/status/1915677858867105862" rel="nofollow">https://x.com/GeoffreyHuntley/status/1915677858867105862</a>

评论 #43793928 未加载

rvz20 days ago

Every time another startup falls for the Java + Kafka arguments, it keeps the AWS consultants happier.Fast forward into 2025, there are many performant, efficient and less complex alternatives to Kafka that save you money, instead of burning millions in operational costs "to scale".Unless you are at a hundred million dollar revenue company, choosing Kafka in 2025 is doesn't make sense anymore.

评论 #43793573 未加载

评论 #43794169 未加载

评论 #43793919 未加载