TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google proposes its Dataflow batch/stream tech to the Apache Incubator

191 pointsby crbover 9 years ago

9 comments

fhoffaover 9 years ago
Note that this proposal is being back not only by Google, but also Cloudera, Data Artisans, Talend, Cask, PayPal, ...<p>Some other posts on the announcement:<p><a href="http:&#x2F;&#x2F;googlecloudplatform.blogspot.com&#x2F;2016&#x2F;01&#x2F;Dataflow-and-open-source-proposal-to-join-the-Apache-Incubator.html" rel="nofollow">http:&#x2F;&#x2F;googlecloudplatform.blogspot.com&#x2F;2016&#x2F;01&#x2F;Dataflow-and...</a><p><a href="http:&#x2F;&#x2F;blog.cloudera.com&#x2F;blog&#x2F;2016&#x2F;01&#x2F;spark-dataflow-joins-googles-dataflow-sdk&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.cloudera.com&#x2F;blog&#x2F;2016&#x2F;01&#x2F;spark-dataflow-joins-g...</a><p><a href="http:&#x2F;&#x2F;data-artisans.com&#x2F;dataflow-proposed-as-apache-incubator-project&#x2F;" rel="nofollow">http:&#x2F;&#x2F;data-artisans.com&#x2F;dataflow-proposed-as-apache-incubat...</a><p><a href="http:&#x2F;&#x2F;blog.cask.co&#x2F;2016&#x2F;01&#x2F;cask-anticipates-googles-dataflow-to-flourish-in-apache&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.cask.co&#x2F;2016&#x2F;01&#x2F;cask-anticipates-googles-dataflo...</a>
评论 #10942032 未加载
mindprinceover 9 years ago
&gt; While Google has previously published papers describing some of its technologies, Google decided to take a different approach with Dataflow. Google open-sourced the SDK and model alongside commercialization of the idea and ahead of publishing papers on the topic.<p>A large number of ASF projects in the Big Data space are inspired by Google&#x27;s publications. Good to see Google finally taking the lead and coming out with code.
meltedover 9 years ago
Seems like this would duplicate a rather large chunk of Apache Crunch, which implements Google Flume nearly exactly as far as public API is concerned. As far as I can tell, Google Dataflow is also a variation on top of Google Flume. It would be helpful if they could elucidate why this project would not be redundant under the Apache umbrella.
评论 #10942235 未加载
评论 #10942280 未加载
评论 #10941902 未加载
评论 #10942528 未加载
评论 #10941943 未加载
syskover 9 years ago
Can anyone ELI5 what it means for an open source project to become an Apache project? Why doesn&#x27;t Google just push the code on Github?
评论 #10948595 未加载
评论 #10944396 未加载
评论 #10942914 未加载
评论 #10942998 未加载
Wonnk13over 9 years ago
what are the best resources to learn about streaming, dataflow, etc? Not necessarily the Google implementations, but the core concepts backing them.
评论 #10943014 未加载
评论 #10942999 未加载
xcelqover 9 years ago
Can we hope to see a google like search engine open source? I&#x27;m just waiting for this day to happen.
ericandover 9 years ago
O&#x27;Reilly post also released today references the Apache Dataflow submission: <a href="https:&#x2F;&#x2F;www.oreilly.com&#x2F;ideas&#x2F;the-world-beyond-batch-streaming-102" rel="nofollow">https:&#x2F;&#x2F;www.oreilly.com&#x2F;ideas&#x2F;the-world-beyond-batch-streami...</a>
评论 #10942615 未加载
obulpathiover 9 years ago
It would be awesome to have the code portable across various big data engines.
BenoitPover 9 years ago
Where does Dataflow stands? Is it only a wrapper, trying to define a standard API for combining stream producers, datastores, and stream engines?
评论 #10953401 未加载