TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

MapReduce: A major step backwards (2008)

4 点作者 li4ick将近 3 年前

1 comment

dekhn将近 3 年前
This complaint shows a huge misunderstanding of MapReduce. It was never, ever intended to replace RDBMS. It was literally written so that Jeff and Sanjay could get the google index process completed, when the previous version required complete run-throughs of every step, with no failures. MapReduce doesn&#x27;t have indices because... that&#x27;s not the point. It&#x27;s for doing large scans through the data sequentially. See the sstable format for more details.<p>After this document was written, Hadoop became popular on the outside but had no end of problems. Eventually, most systems were replaced with more advanced ones- for example, MapReduce was replaced by Flume(Java&#x2F;C++&#x2F;whatever). And often times, people do jobs in these systems against storage systems that have indexing.<p>Most importantly, there was no rdbms that could build the google index at the time, and google only succeeded because they could build large indices fast. It literally was a technology that made or broke the company (I was hired around 2008 to help run a system that <i>was</i> mission critical and <i>did</i> run on an rdbms, but it played a very different role from MapReduce. I also worked on one of the hairiest mapreduces, used to do something it really was not well designed for: large-scale machine learning.<p>Note that Dewitt is the guy that the dewitt clause was written for. And stonebraker invented modern rdbmss. Why they chose this hill to die on (and there was a whole saga that happened after this paper was written) is mystifying to me.