TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Flying faster with Twitter Heron

117 点作者 Rifu将近 10 年前

7 条评论

haberman将近 10 年前
Skimming the paper, I found it hard to get a handle on exactly what kind of abstraction and guarantees this system implements (I am a co-author on one of the cited papers, so I&#x27;m reasonably familiar with this space). Maybe it is easier to understand for people who are already familiar with Storm.<p>Most notably, I couldn&#x27;t determine whether the system is stateful, or what kind of guarantees (if any) are provided regarding stateful processing.<p>For example, the paper says Heron is used to compute real-time active user counts. That implies that the system needs some way to keep track of and &quot;remember&quot; how many unique users it has seen in the last N hours or whatever. How does Heron model this state and how does it guarantee (if it does) that a crashing node will not lose its accumulated state?<p>In my experience this is the hardest part, by far, of stream processing, so when I see any work in this area it&#x27;s the first thing I am curious to learn about. A system that guarantees strong consistency (ie. accurate counts) even in the presence of node crashes is way, way harder to get right (and a lot more expensive, resource-wise) than one that assumes it&#x27;s ok to lose a little bit of data.<p>It looks like Heron implements only at-most-once and at-least-once semantics, so maybe that is my answer there. You need exactly-once semantics to get robust and reliable answers, and you need to guarantee that state changes are atomic with the exactly-once semantics.<p>Of course some systems are ok with their output degrading a little when nodes crash. It&#x27;s not the end of the world if the active user count is a little off. But beware of tolerating this too much -- the bad thing about allowing data loss is that it tends to come in storms (no pun intended). Once something is going wrong, the answers can be way off. The error is not bounded in most cases I&#x27;ve encountered.
评论 #9649176 未加载
djb_hackernews将近 10 年前
I wonder what Nathan Marz thinks about this, as he is the guy that created Storm (which IMO, is some of the best OSS code out there)
filereaper将近 10 年前
Interesting to see how Heron compares up to Spark wrt to performance. I keep hearing Storm is slower than Spark, does Heron now catch up and exceed in terms of performance?
评论 #9648839 未加载
评论 #9650100 未加载
vicaya将近 10 年前
Storm is not even 1.0 yet. Since Heron is API compatible with Storm, why can&#x27;t Heron be simply a code name for Storm 2.0?<p>Storm is a much better name than Heron, IMO.
评论 #9648634 未加载
ptgoetz将近 10 年前
(DISCLAIMER: I am an Apache Storm PMC Member)<p>This could be very good thing for Apache Storm depending on how Twitter handles it.<p>Just to clarify, the Storm version mentioned in the paper and blog post is not an official Apache release and doesn&#x27;t include many performance improvements included in the newer releases of Apache Storm. There are a lot, and many more on the horizon.<p>That being said, the performance numbers look impressive, even though there is no way confirm those since no code or benchmarks have been published. IMHO, until that happens, there&#x27;s not much to see here (not that I doubt it -- I&#x27;d just like to see proof&#x2F;code).<p>My hope is that Twitter is dedicated to the projects it has open-sourced, and this is not a case of NIH, but rather an honest effort on Twitter&#x27;s part to contribute back to the open source community.<p>@haberman:<p>Storm implements exactly-once processing through a higher-level API called Trident, that I like to call Storm&#x27;s &quot;Streams API&quot; since it&#x27;s not unlike Java 8&#x27;s Streams API (and largely inspired by Cascading). Trident processes data in configurable micro-batches, as opposed to one-at-a-time, which gives it an advantage in terms of throughput, but at the cost of latency. Trident topologies &quot;compile&quot; down to Core Spout&#x2F;Bolt topologies (The Trident API has a planner implementation that figures that out -- not unlike an SQL query planner).<p>The Storm Core API provides at-least-once semantics through an acking mechanism described here [1]. The Trident API builds on top of that to support exactly-once semantics by essentially doing a de-dupe [2].<p>I&#x27;m not sure exactly why they don&#x27;t claim to support this, since Trident is build on top of Storm&#x27;s Core API.<p>[1] <a href="https:&#x2F;&#x2F;storm.apache.org&#x2F;documentation&#x2F;Acking-framework-implementation.html" rel="nofollow">https:&#x2F;&#x2F;storm.apache.org&#x2F;documentation&#x2F;Acking-framework-impl...</a> [2] <a href="https:&#x2F;&#x2F;storm.apache.org&#x2F;documentation&#x2F;Trident-state.html" rel="nofollow">https:&#x2F;&#x2F;storm.apache.org&#x2F;documentation&#x2F;Trident-state.html</a><p>@filereaper:<p>Assuming you are referring to Spark streaming, forget about any benchmarks you may have seen. Either can be faster than the other depending on how you configure it, and what your use case is. See my presentation on the subject here [3]. With either, you can configure yourself into a corner and screw your performance.<p>Performance tuning distributed systems is a mysterious art. As is benchmarking. Unfortunately, that fact is frequently exploited for &quot;benchmarketing&quot; purposes. Don&#x27;t trust any benchmark but your own unless it is fully open-sourced (including configuration).<p>[3] <a href="http:&#x2F;&#x2F;www.slideshare.net&#x2F;ptgoetz&#x2F;apache-storm-vs-spark-streaming" rel="nofollow">http:&#x2F;&#x2F;www.slideshare.net&#x2F;ptgoetz&#x2F;apache-storm-vs-spark-stre...</a><p>@vicaya<p>Version numbers don&#x27;t necessarily equate to code quality, performance, or stability. I&#x27;ve seen many projects bump to 1.0 only for marketing purposes.
jhugg将近 10 年前
Open source? No?
评论 #9649740 未加载
评论 #9648204 未加载
shit_parade2将近 10 年前
Developing using twitter open source and api is essentially asking to to robbed by them if anything you make is successful.
评论 #9648297 未加载
评论 #9648194 未加载
评论 #9648432 未加载
评论 #9649129 未加载