TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

MapReduce and Spark

65 点作者 rxin超过 11 年前

5 条评论

lmm超过 11 年前
I like Spark over Hadoop just from an interface point of view, particularly the ability to just start up a (Scala) shell and start playing around. Hadoop can be very effective, but even getting "hello world" to run requires an intimidating array of setup.
评论 #7003574 未加载
评论 #7003724 未加载
评论 #7003777 未加载
hobbyist超过 11 年前
I often read that spark avoids the costly synchronization required in mapreduce, since it uses DAG's. Can someone explain how is that achieved. If the application so demands that you can launch jobs together, that can be done even with hadoop/mapreduce. If one job requires the output of another, then the job has to wait for synchronization whether its mapreduce or DAG.
评论 #7003924 未加载
评论 #7003922 未加载
justinkestelyn超过 11 年前
Some interesting use cases are also described on Cloudera&#x27;s developer blog, at <a href="http://blog.cloudera.com/blog/2013/11/putting-spark-to-use-fast-in-memory-computing-for-your-big-data-applications/" rel="nofollow">http:&#x2F;&#x2F;blog.cloudera.com&#x2F;blog&#x2F;2013&#x2F;11&#x2F;putting-spark-to-use-f...</a>.
fintler超过 11 年前
Although spark is nice, I&#x27;m also looking forward to mpi&#x2F;orted integration with hadoop...<p>&quot;Performance: Launches ~1000x faster, runs ~10x faster&quot;<p>&quot;Launch scaling: Hadoop (~N), MR+ (~logN)&quot;<p>&quot;Wireup: Hadoop (~N2), MR+ (~logN)&quot;<p><a href="http://slurm.schedmd.com/slurm_ug_2012/MapRedSLURM.pdf" rel="nofollow">http:&#x2F;&#x2F;slurm.schedmd.com&#x2F;slurm_ug_2012&#x2F;MapRedSLURM.pdf</a>
wheaties超过 11 年前
What I would love to know is if Mahout works out of the box with Spark or if there&#x27;s a third party library that bridges the two.
评论 #7004437 未加载