TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Storm - the Hadoop of realtime processing

119 pointsby lzimmabout 14 years ago

12 comments

vannevarabout 14 years ago
Storm sounds great, but this post probably should have waited until it was actually open-sourced. As it is, it just comes across as naked self-promotion based on a technology that could for all we know be vaporware.
评论 #2588531 未加载
评论 #2588631 未加载
评论 #2588320 未加载
评论 #2588301 未加载
bmohlenhoffabout 14 years ago
It sounds like a neat project, but I think describing it as "real time" is misleading if you're not also providing information on latency. The majority of the provided use cases seem to indicate a high level of scalability and durability, as well as a high level of throughput, but these are not necessary characteristics of a true real time system.<p>It's a common misconception. A real time system doesn't have to be fast, efficient, or fault tolerant. A real time system must guarantee with 100% certainty that in all cases it will respond to input X within a time period of Y.<p>I would be interested to learn the timing issues driving the development of this system and how you've guaranteed such a response time, especially given that it's running on top of the JVM and must therefore deal with a non-deterministic garbage collection process.
评论 #2588828 未加载
sigilabout 14 years ago
This looks interesting. Questions:<p>(1) What do you mean by a processing topology -- is this a data dependency graph?<p>(2) How does one define a topology? Is this specified at deployment time via the jar file, or can it be configured separately and on the fly?<p>(3) Must records be processed in time order, or can they be sorted and aggregated on some other key?
评论 #2594144 未加载
ScottBursonabout 14 years ago
TW;DR!!<p>For a variety of reasons, I keep my browser windows about 900 pixels wide. Your site requires a honking 1280 to get rid of the horizontal scrollbar -- and can't be read in 900 without scrolling horizontally for every line (i.e. the menu on the left is much too wide).<p>(OT, I know, but it's a pet peeve of mine. It's been known for years how to use CSS to make pages stretch or squish, within reason, to the user's window width. 900 is not too narrow!)<p>EDITED to add: yeah, I'm willing to spend some karma points on this, if that's what happens. Wide sites are getting more common, and this is one of the worst I've seen.
评论 #2590145 未加载
jbellisabout 14 years ago
How is this different from a "traditional" CEP system like Esper?<p>(I mean on the actual processing front, rather than architecturally -- sounds like Storm is a bunch of building blocks instead of a unified system.)
评论 #2588995 未加载
scott_sabout 14 years ago
I work on a similar system that was previously discussed on HN: <a href="http://news.ycombinator.com/item?id=2442977" rel="nofollow">http://news.ycombinator.com/item?id=2442977</a>
评论 #2588953 未加载
maxdemarziabout 14 years ago
" To compute reach, you need to get all the people who tweeted the URL, get all the followers of all those people, unique that set of followers, and then count the number of uniques. It's an intense computation that potentially involves thousands of database calls and tens of millions of follower records."<p>Or you could use a Graph DB to solve a Graph problem.<p>URL -&#62; tweeted_by -&#62; users -&#62; followed_by -&#62; users<p>Try that on Neo4j.
评论 #2588507 未加载
herdrickabout 14 years ago
This sounds great.<p><i>This is the traditional realtime processing use case: process messages and update a variety of databases.</i><p>Question: I typically think of real-time as a need for user-facing things, i.e. handling a user's requests before he gets bored and goes away. Is Storm set up for that? Or is it mostly meant to update a database with results rather than return them to a waiting process?
评论 #2588733 未加载
Maroabout 14 years ago
I'm not sure if this is the same thing, but there's also a new company called Hadapt (the commercialization of HadoopDB). It's about adapting Hadoop for real-time analytic SQL queries by putting local SQL dbs on the Hadoop nodes and then using the Hadoop plumbing. It's based on Daniel Abadi's research, he's a really smart guy.
评论 #2588934 未加载
pnathanabout 14 years ago
Can you comment on distributing non-JAR software?<p>Also, this sounds faintly like the old SunGridEngine.
评论 #2589357 未加载
earlabout 14 years ago
How is this different / better than Yahoo S4 [1], which does have code on github? [2]? Why did you choose to build this, or did you start before S4 became public?<p>[1] <a href="http://docs.s4.io/" rel="nofollow">http://docs.s4.io/</a> [2] <a href="https://github.com/s4/core" rel="nofollow">https://github.com/s4/core</a>
评论 #2589060 未加载
tedjdziubaabout 14 years ago
This sounds like something that's been painfully over-engineered.<p>One of the main problems they solve is "distributed RPC", from TFA: "There are a lot of queries that are both hard to precompute and too intense to compute on the fly on a single machine."<p>That's generally a sign that you've made a mistake somewhere in your application design. Pain is a response that tells you "stop doing that".
评论 #2588410 未加载