(DISCLAIMER: I am an Apache Storm PMC Member)<p>This could be very good thing for Apache Storm depending on how Twitter handles it.<p>Just to clarify, the Storm version mentioned in the paper and blog post is not an official Apache release and doesn't include many performance improvements included in the newer releases of Apache Storm. There are a lot, and many more on the horizon.<p>That being said, the performance numbers look impressive, even though there is no way confirm those since no code or benchmarks have been published. IMHO, until that happens, there's not much to see here (not that I doubt it -- I'd just like to see proof/code).<p>My hope is that Twitter is dedicated to the projects it has open-sourced, and this is not a case of NIH, but rather an honest effort on Twitter's part to contribute back to the open source community.<p>@haberman:<p>Storm implements exactly-once processing through a higher-level API called Trident, that I like to call Storm's "Streams API" since it's not unlike Java 8's Streams API (and largely inspired by Cascading). Trident processes data in configurable micro-batches, as opposed to one-at-a-time, which gives it an advantage in terms of throughput, but at the cost of latency. Trident topologies "compile" down to Core Spout/Bolt topologies (The Trident API has a planner implementation that figures that out -- not unlike an SQL query planner).<p>The Storm Core API provides at-least-once semantics through an acking mechanism described here [1]. The Trident API builds on top of that to support exactly-once semantics by essentially doing a de-dupe [2].<p>I'm not sure exactly why they don't claim to support this, since Trident is build on top of Storm's Core API.<p>[1] <a href="https://storm.apache.org/documentation/Acking-framework-implementation.html" rel="nofollow">https://storm.apache.org/documentation/Acking-framework-impl...</a>
[2] <a href="https://storm.apache.org/documentation/Trident-state.html" rel="nofollow">https://storm.apache.org/documentation/Trident-state.html</a><p>@filereaper:<p>Assuming you are referring to Spark streaming, forget about any benchmarks you may have seen. Either can be faster than the other depending on how you configure it, and what your use case is. See my presentation on the subject here [3]. With either, you can configure yourself into a corner and screw your performance.<p>Performance tuning distributed systems is a mysterious art. As is benchmarking. Unfortunately, that fact is frequently exploited for "benchmarketing" purposes. Don't trust any benchmark but your own unless it is fully open-sourced (including configuration).<p>[3] <a href="http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming" rel="nofollow">http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-stre...</a><p>@vicaya<p>Version numbers don't necessarily equate to code quality, performance, or stability. I've seen many projects bump to 1.0 only for marketing purposes.