TechEcho

1 comment

dekhnalmost 3 years ago

This complaint shows a huge misunderstanding of MapReduce. It was never, ever intended to replace RDBMS. It was literally written so that Jeff and Sanjay could get the google index process completed, when the previous version required complete run-throughs of every step, with no failures. MapReduce doesn't have indices because... that's not the point. It's for doing large scans through the data sequentially. See the sstable format for more details.After this document was written, Hadoop became popular on the outside but had no end of problems. Eventually, most systems were replaced with more advanced ones- for example, MapReduce was replaced by Flume(Java/C++/whatever). And often times, people do jobs in these systems against storage systems that have indexing.Most importantly, there was no rdbms that could build the google index at the time, and google only succeeded because they could build large indices fast. It literally was a technology that made or broke the company (I was hired around 2008 to help run a system that was mission critical and did run on an rdbms, but it played a very different role from MapReduce. I also worked on one of the hairiest mapreduces, used to do something it really was not well designed for: large-scale machine learning.Note that Dewitt is the guy that the dewitt clause was written for. And stonebraker invented modern rdbmss. Why they chose this hill to die on (and there was a whole saga that happened after this paper was written) is mystifying to me.

MapReduce: A major step backwards (2008)

1 comment

MapReduce: A major step backwards (2008)

1 comment