Using logs to build a solid data infrastructure

120 点作者 martinkl大约 10 年前

11 条评论

I've been really interested in this architecture since Jay Kreps' blog post on it. One part that I'm less clear on is how this fits in with request-response style communication between, say, a Web browser and a Web server.In a simple Web-app-writes-to-DB scenario, it's easy to read my writes, but with a async log processing system, how am I supposed to organize my code so I can read my writes and respond with useful information?Maybe the solution is to eschew request-response entirely and have all requests return 200, then poll or use two-way communication?Alternatively, I could have my log-appending operation return a value indicating the position in the totally-ordered log, which I could pass to the query interfaces as a way of indicating "don't return until you've processed at least to here." Does anyone do that?Am I totally off base here? I'd love to hear from anyone who is using these kinds of systems today.

评论 #9616046 未加载

评论 #9615242 未加载

评论 #9616525 未加载

评论 #9615054 未加载

评论 #9615211 未加载

robbles大约 10 年前

I've really learned a lot from these Confluent posts about building log-based architectures, but I feel like they're rehashing the same high-level architectural ideas again and again.I'm already sold on this idea, and would really love to see more posts that get into the nitty-gritty details of how to integrate Kafka, how to migrate an existing infrastructure, case studies, sample code, etc. It all seems very handwavy otherwise.

sridca大约 10 年前

Has anyone thought of how the distributed commit log can be extended to client-side FRP?Elm has this notion of signal – <a href="http://elm-lang.org/learn/What-is-FRP.elm" rel="nofollow">http://elm-lang.org/learn/What-is-FRP.elm</a> – which is really a stream of changing values, that are used to construct varying model and its rendered view (virtual DOM and all that).I am wondering if we can merge these two notions – signals and commit logs. Consequently, this would replace the traditional "request-response" model in REST API with nothing but signals, thus leveraging the simplicity of FRP for the whole application. Client side Elm code does a "send" on the signal that is connected to the server-side commit log, and also reads from another signal connected to another log that receives new data (added from various places).

tristanz大约 10 年前

Martin's talks and blog posts are aways awesome. I'm really excited to see how this plays out for real applications.The one thing I'm always somewhat confused by though is how a "totally ordered log" intersects with the reality of a partitioned log. The simplicity of a log seems to break down a bit when you partition.For instance, imagine I want to implement multi-key transactions on top of a distributed datastore. With a totally ordered log this is easy. But with a partitioned log, it becomes much harder.Alternatively, imagine I want to implement a collaborative editing app like Google Docks or something like Slack. A natural design would be to to have millions of independent logs. I can then replay logs to get current state and watch logs to keep it updated. But as far as I'm aware, partitioned logs like Kafka do not actually support millions of topics. So there's no way to replay a log for something like a channel or document.

评论 #9614706 未加载

评论 #9614785 未加载

moatra大约 10 年前

<pre><code> if you want to build a new derived datastore, you can just start a new consumer at the beginning of the log, and churn through the history of the log, applying all the writes to your datastore. </code></pre> For high-throughput environments with lots of appends to the log, how do you get around the ever-increasing size of your log file? I know the traditional answer is to take a periodic snapshot and compact the previous data, but is that built in to tools like Kafka?

评论 #9614805 未加载

评论 #9615411 未加载

gbrits大约 10 年前

The proposed architecture really works well for me. I've used it for a couple of projects now.To throw around some terms for those interested in reading up/background: - The separation of writes (through log) from reads (through any of the consumers) is sometimes called: CQRS (command query responsibility separation) - having a centralized log as the defining store for updates/ change events is sometimes called: eventsourcing - as mentioned in article: elastic search as a consumer of the log, which only gets updates through the log, is an example of an Eager Read Derivation.All defined on site of Fowler.Glad to see this getting more attention. Asked about usage for Kafka as an eventsource here some time ago. Includes insightful answer of Kafka author. <a href="http://stackoverflow.com/questions/17708489/using-kafka-as-a-cqrs-eventstore-good-idea" rel="nofollow">http://stackoverflow.com/questions/17708489/using-kafka-as-a...</a>

评论 #9615642 未加载

评论 #9616907 未加载

Sphax大约 10 年前

That talk was great. We introduced Kafka at my work probably 3-4 months ago, at first only to track events from our webservices, but eventually it became the backbone of communication between our services.The Java library for the consumer part, still based on the Scala code, is not that great though. They're rewriting a Java-only library, which is much nicer to use, but I'm not sure when it'll be stable.

saganus大约 10 年前

I have actually found myself thinking a lot about logs lately and how I can end up using them for a lot of problems, and are sometimes very simple to implement. But I always wondered if I was actually using the right tool for the job...I had no idea they had such far-reaching implications and so many jobs where this is the right tool.Neat!

andrewchambers大约 10 年前

Isn't this just what datomic automatically does?

benjarrell大约 10 年前

What is used to create those images?They look like whiteboard pictures but not quite.

评论 #9614801 未加载

评论 #9614794 未加载

itistoday2大约 10 年前

Could someone compare Apache Kafka to Eris Industry's centralized blockchain model?

评论 #9616548 未加载

评论 #9617287 未加载