TechEcho

Previous Discussion: <a href="http://news.ycombinator.com/item?id=3014039" rel="nofollow">http://news.ycombinator.com/item?id=3014039</a>

It's barely similar, it's a fault-tolerant system for scaling computation. Storm provides real-time streaming computation. Your spouts provide infinite streams of tuples, small objects which store serialized other types that you then emit 0 or more tuples out of that tuple.<p>You could liken it to a streaming mapreduce that you can rearrange into directed graphs of data flows called a topology.<p>Re: Spark, it's a totally different paradigm that's like a map reduce which takes advantage of memory locality where Hadoop takes advantage of disk locality. Hive on Spark is a pretty beastly system.

Storm is actually not similar to Hadoop at all. I think this title resulted from a misreading of the README, which states: "Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation."<p>/nitpick

So which is better Storm[1] or Spark[2] ?<p>1. <a href="http://storm-project.net/" rel="nofollow">http://storm-project.net/</a> 2. <a href="http://www.spark-project.org/" rel="nofollow">http://www.spark-project.org/</a>

JRuby DSL and Integration for Storm here: <a href="https://github.com/colinsurprenant/redstorm" rel="nofollow">https://github.com/colinsurprenant/redstorm</a>

How is this better than Hadoop?

Previous Discussion: <a href="http://news.ycombinator.com/item?id=3014039" rel="nofollow">http://news.ycombinator.com/item?id=3014039</a>

JRuby DSL and Integration for Storm here: <a href="https://github.com/colinsurprenant/redstorm" rel="nofollow">https://github.com/colinsurprenant/redstorm</a>

How is this better than Hadoop?

Storm : a Realtime Computation System Similar to Hadoop

6 comments

Storm : a Realtime Computation System Similar to Hadoop

6 comments