It's barely similar, it's a fault-tolerant system for scaling computation. Storm provides real-time streaming computation. Your spouts provide infinite streams of tuples, small objects which store serialized other types that you then emit 0 or more tuples out of that tuple.<p>You could liken it to a streaming mapreduce that you can rearrange into directed graphs of data flows called a topology.<p>Re: Spark, it's a totally different paradigm that's like a map reduce which takes advantage of memory locality where Hadoop takes advantage of disk locality. Hive on Spark is a pretty beastly system.
Storm is actually not similar to Hadoop at all. I think this title resulted from a misreading of the README, which states: "Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation."<p>/nitpick
So which is better Storm[1] or Spark[2] ?<p>1. <a href="http://storm-project.net/" rel="nofollow">http://storm-project.net/</a>
2. <a href="http://www.spark-project.org/" rel="nofollow">http://www.spark-project.org/</a>
JRuby DSL and Integration for Storm here: <a href="https://github.com/colinsurprenant/redstorm" rel="nofollow">https://github.com/colinsurprenant/redstorm</a>