TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Kafka Is Costing You Years of Engineering Time

3 点作者 galeaspablo8 个月前

3 条评论

deniscoady8 个月前
Disclaimer: I work for Redpanda and formerly Cloudera.<p>I&#x27;ve worked with Apache Kafka at massive (50+ Gbps) scales. It&#x27;s a proper nightmare. When it breaks – it breaks <i>fast and violently</i>.<p>But the problem is that Apache Kafka (and more modern Kafka-compatible alternatives like Redpanda &lt; obligatory mention) solve a need for a durable streaming log that other systems cannot offer. The access patterns, requirements, use cases, ecosystem, etc, are different from those of traditional databases and require a proper streaming solution.<p>Streaming from a traditional database is kinda a solved problem. Why not just use a managed Kafka provider with a change data capture (CDC) capability if you don&#x27;t want to deal with Kafka yourself? At least then you get to use all of the tools in the vibrant Kafka ecosystem.
评论 #41747262 未加载
Sphax8 个月前
I&#x27;ve been using Kafka professionally for more than 10 years, since 0.8 where consumer groups didn&#x27;t even exist yet. In my opinion this post exagerates a lot of things to promote their product. We don&#x27;t have giant clusters but we routinely do more than a million messages produced&#x2F;s so it&#x27;s not a completely trivial load.<p>Configuration complexity: there are a couple of things we had to tune over the years, mainly regarding the log cleaner once we started leveraging compacted topics, but other than that it&#x27;s pretty much the default config. Is it the most optimal ? No but it&#x27;s fast enough. Hardware choice in my opinion is not really an issue: we started on HDDs switching to SSDs later on, the cluster continued working just fine with the same configuration.<p>Scaling I&#x27;ll grant can be a pain. We had to scale our clusters mainly for two reasons: 1) more services want to use Kafka therefore there are more topics and more data to serve. This is not that hard to scale: just add brokers to have more capacity. 2) is when you need more partitions for a topic; we had to do this a couple of times over the years and it&#x27;s annoying because the default tooling to do data redistribution is bad. We ended up using a third party tool (today Cruise Control does this nicely).<p>Maintenance: yes, you need to monitor your stuff. Just like any other system you deploy on your own hardware. Thankfully monitoring Kafka is not _that_ hard, there are ready made solutions to export the JMX monitoring data. We use Prometheus (prometheus-jmx-exporter and node_exporter) almost since the beginning and it works fine. We&#x27;re still using ZooKeeper but thankfully that&#x27;s no longer necessary, I just have to say our zookeeper clusters have been rock solid over the years.<p>Development overheads: I really can&#x27;t agree with that. Yes, the &quot;main&quot; ecosystem is Java based but it&#x27;s not like librdkafka doesn&#x27;t exist, and third party libraries are not all &quot;sub par&quot;, that&#x27;s just a mischaracterization. We use Go with sarama since 2014, recently switched to using franz-go: both work great. You do need to properly evaluate your options though (but that&#x27;s part of your job). With that said, if I were to start from scratch I would absolutely suggest starting with Kafka Streams, even if your team doesn&#x27;t hava java experience (I mean learning Java isn&#x27;t that hard), just because it makes building a data pipeline super straightforward and handle a lot of the complexities mentioned.
taylodl8 个月前
Conspicuously missing from this article is any mention of an alternative. Kafka, bad. Alternative, what alternative?
评论 #41745824 未加载
评论 #41747228 未加载
评论 #41746668 未加载