Note that this proposal is being back not only by Google, but also Cloudera, Data Artisans, Talend, Cask, PayPal, ...<p>Some other posts on the announcement:<p><a href="http://googlecloudplatform.blogspot.com/2016/01/Dataflow-and-open-source-proposal-to-join-the-Apache-Incubator.html" rel="nofollow">http://googlecloudplatform.blogspot.com/2016/01/Dataflow-and...</a><p><a href="http://blog.cloudera.com/blog/2016/01/spark-dataflow-joins-googles-dataflow-sdk/" rel="nofollow">http://blog.cloudera.com/blog/2016/01/spark-dataflow-joins-g...</a><p><a href="http://data-artisans.com/dataflow-proposed-as-apache-incubator-project/" rel="nofollow">http://data-artisans.com/dataflow-proposed-as-apache-incubat...</a><p><a href="http://blog.cask.co/2016/01/cask-anticipates-googles-dataflow-to-flourish-in-apache/" rel="nofollow">http://blog.cask.co/2016/01/cask-anticipates-googles-dataflo...</a>
> While Google has previously published papers describing some of its technologies, Google decided to take a different approach with Dataflow. Google open-sourced the SDK and model alongside commercialization of the idea and ahead of publishing papers on the topic.<p>A large number of ASF projects in the Big Data space are inspired by Google's publications. Good to see Google finally taking the lead and coming out with code.
Seems like this would duplicate a rather large chunk of Apache Crunch, which implements Google Flume nearly exactly as far as public API is concerned. As far as I can tell, Google Dataflow is also a variation on top of Google Flume. It would be helpful if they could elucidate why this project would not be redundant under the Apache umbrella.
O'Reilly post also released today references the Apache Dataflow submission:
<a href="https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102" rel="nofollow">https://www.oreilly.com/ideas/the-world-beyond-batch-streami...</a>