科技回声

7 条评论

haberman将近 10 年前

Skimming the paper, I found it hard to get a handle on exactly what kind of abstraction and guarantees this system implements (I am a co-author on one of the cited papers, so I'm reasonably familiar with this space). Maybe it is easier to understand for people who are already familiar with Storm.Most notably, I couldn't determine whether the system is stateful, or what kind of guarantees (if any) are provided regarding stateful processing.For example, the paper says Heron is used to compute real-time active user counts. That implies that the system needs some way to keep track of and "remember" how many unique users it has seen in the last N hours or whatever. How does Heron model this state and how does it guarantee (if it does) that a crashing node will not lose its accumulated state?In my experience this is the hardest part, by far, of stream processing, so when I see any work in this area it's the first thing I am curious to learn about. A system that guarantees strong consistency (ie. accurate counts) even in the presence of node crashes is way, way harder to get right (and a lot more expensive, resource-wise) than one that assumes it's ok to lose a little bit of data.It looks like Heron implements only at-most-once and at-least-once semantics, so maybe that is my answer there. You need exactly-once semantics to get robust and reliable answers, and you need to guarantee that state changes are atomic with the exactly-once semantics.Of course some systems are ok with their output degrading a little when nodes crash. It's not the end of the world if the active user count is a little off. But beware of tolerating this too much -- the bad thing about allowing data loss is that it tends to come in storms (no pun intended). Once something is going wrong, the answers can be way off. The error is not bounded in most cases I've encountered.

评论 #9649176 未加载

djb_hackernews将近 10 年前

I wonder what Nathan Marz thinks about this, as he is the guy that created Storm (which IMO, is some of the best OSS code out there)

filereaper将近 10 年前

Interesting to see how Heron compares up to Spark wrt to performance. I keep hearing Storm is slower than Spark, does Heron now catch up and exceed in terms of performance?

评论 #9648839 未加载

评论 #9650100 未加载

vicaya将近 10 年前

Storm is not even 1.0 yet. Since Heron is API compatible with Storm, why can't Heron be simply a code name for Storm 2.0?Storm is a much better name than Heron, IMO.

评论 #9648634 未加载

ptgoetz将近 10 年前

(DISCLAIMER: I am an Apache Storm PMC Member)This could be very good thing for Apache Storm depending on how Twitter handles it.Just to clarify, the Storm version mentioned in the paper and blog post is not an official Apache release and doesn't include many performance improvements included in the newer releases of Apache Storm. There are a lot, and many more on the horizon.That being said, the performance numbers look impressive, even though there is no way confirm those since no code or benchmarks have been published. IMHO, until that happens, there's not much to see here (not that I doubt it -- I'd just like to see proof/code).My hope is that Twitter is dedicated to the projects it has open-sourced, and this is not a case of NIH, but rather an honest effort on Twitter's part to contribute back to the open source community.@haberman:Storm implements exactly-once processing through a higher-level API called Trident, that I like to call Storm's "Streams API" since it's not unlike Java 8's Streams API (and largely inspired by Cascading). Trident processes data in configurable micro-batches, as opposed to one-at-a-time, which gives it an advantage in terms of throughput, but at the cost of latency. Trident topologies "compile" down to Core Spout/Bolt topologies (The Trident API has a planner implementation that figures that out -- not unlike an SQL query planner).The Storm Core API provides at-least-once semantics through an acking mechanism described here [1]. The Trident API builds on top of that to support exactly-once semantics by essentially doing a de-dupe [2].I'm not sure exactly why they don't claim to support this, since Trident is build on top of Storm's Core API.[1] <a href="https://storm.apache.org/documentation/Acking-framework-implementation.html" rel="nofollow">https://storm.apache.org/documentation/Acking-framework-impl...</a> [2] <a href="https://storm.apache.org/documentation/Trident-state.html" rel="nofollow">https://storm.apache.org/documentation/Trident-state.html</a>@filereaper:Assuming you are referring to Spark streaming, forget about any benchmarks you may have seen. Either can be faster than the other depending on how you configure it, and what your use case is. See my presentation on the subject here [3]. With either, you can configure yourself into a corner and screw your performance.Performance tuning distributed systems is a mysterious art. As is benchmarking. Unfortunately, that fact is frequently exploited for "benchmarketing" purposes. Don't trust any benchmark but your own unless it is fully open-sourced (including configuration).[3] <a href="http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming" rel="nofollow">http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-stre...</a>@vicayaVersion numbers don't necessarily equate to code quality, performance, or stability. I've seen many projects bump to 1.0 only for marketing purposes.

jhugg将近 10 年前

Open source? No?

评论 #9649740 未加载

评论 #9648204 未加载

shit_parade2将近 10 年前

Developing using twitter open source and api is essentially asking to to robbed by them if anything you make is successful.

评论 #9648297 未加载

评论 #9648194 未加载

评论 #9648432 未加载

评论 #9649129 未加载

7 条评论

haberman将近 10 年前

评论 #9649176 未加载

djb_hackernews将近 10 年前

I wonder what Nathan Marz thinks about this, as he is the guy that created Storm (which IMO, is some of the best OSS code out there)

filereaper将近 10 年前

Interesting to see how Heron compares up to Spark wrt to performance. I keep hearing Storm is slower than Spark, does Heron now catch up and exceed in terms of performance?

评论 #9648839 未加载

评论 #9650100 未加载

vicaya将近 10 年前

Storm is not even 1.0 yet. Since Heron is API compatible with Storm, why can't Heron be simply a code name for Storm 2.0?Storm is a much better name than Heron, IMO.

评论 #9648634 未加载

ptgoetz将近 10 年前

jhugg将近 10 年前

Open source? No?

评论 #9649740 未加载

评论 #9648204 未加载

shit_parade2将近 10 年前

Developing using twitter open source and api is essentially asking to to robbed by them if anything you make is successful.

评论 #9648297 未加载

评论 #9648194 未加载

评论 #9648432 未加载

评论 #9649129 未加载

Flying faster with Twitter Heron

7 条评论

Flying faster with Twitter Heron

7 条评论