TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

It’s Okay to Store Data in Apache Kafka (2017)

84 pointsby ooooakover 5 years ago

6 comments

voreover 5 years ago
I think what the article greatly skimps over is data migrations: what do you do if you need to change the format of your data? If you retain logs in Kafka indefinitely as the source of truth for your data, then if you need to migrate materialized data to a new format, you&#x27;ll also need to either 1) support all the previous forms of materialized data so operations from the log are guaranteed to be safely replayable on it, or 2) don&#x27;t do that and keep one form of materialized data and hope you have enough test coverage to make sure some unexpectedly old data doesn&#x27;t silently corrupt your materialized data.<p>Event sourcing is useful, but using it as a source of truth data store in itself instead of e.g. an occasional journalling mechanism seems pretty fraught.
评论 #21331453 未加载
评论 #21335662 未加载
评论 #21338328 未加载
epistasisover 5 years ago
Kreps has a gift for writing, this is so clear, well organized, and far more fun to read than the topic has any right to be. Hopefully he&#x27;ll retire after confluent and finally start writing novels.
评论 #21336436 未加载
mvitorinoover 5 years ago
IMO using Kafka for long term storage is not the greatest idea. It is expensive to keep CPU and RAM constantly on top of data that it going to be cold most of the time. There is no DML which means mistakes are expensive (from an engineering pov). And while the whole event sourcing paradigm can work quite well in narrow domains with teams fully aware of the implications of what they are doing, in practice, on large orgs, it is hard to scale (from a people perspective).
评论 #21343335 未加载
ryanthedevover 5 years ago
Well maybe for non critical data. Multi regional Kafka clustering is not easy. There are much better and cheaper data storage options that can provide eventual consistency.
评论 #21333108 未加载
评论 #21332714 未加载
KaiserProover 5 years ago
you _can_ do it, but you shouldn&#x27;t<p>treating your message bus as a infinite storage system is going give you a bad time.
评论 #21331831 未加载
评论 #21331632 未加载
评论 #21331642 未加载
评论 #21331716 未加载
zbentleyover 5 years ago
This article seems to propose log compaction as the answer to the question of size (i.e. how much historical data is going to have to be kept around and how much is that going to cost). However, log compaction is not well suited to many use cases: storing partial updates or diffs in the log, storing many trillions of tiny entries (as keys), multiple messages on the log corresponding to related (contingent) updates, and so on.<p>Those are tractable but hard to solve; log compaction is not a silver bullet and unless you think really hard about how your data changes over time, you may end up storing more of it than you expect if you use the log as an eternal source of truth--compaction or not.