Keep in mind that the article itself admits that the loading the data into an RDBMS took <i>much longer</i> than loading into onto a Hadoop cluster.<p>That's what the main crux is: if your data isn't relational to start with (e.g. messages in a message queue, being written to files in HDFS), you're better of with Map/Reduce. If your data is relational to start with, you're better off doing traditional OLAP on an RDBMs cluster.<p>When I worked for a start-up ad network, we couldn't afford the latency/contention implications of writing to a database each time we served an ad. Initially, we moved the model to "log impressions to text files, rsync text files over, run a cron job to enter them into an RDBMS". Problem is, that processing the data (on a single node) was taking so long the data would no longer by meaningful by the time it would be in RDMBS (and OLAP queries would finish running).<p>I thought of "how can I ensure that a specific machine only processes a specific log file, or a specific section of a log file" -- but then I realized that's what the map part of map/reduce is. In the end I've setup a Hadoop cluster, which made both "soft real time" (hourly/bi-hourly) data processing and long term queries (running on multiple month, multi-terrabyte datasets) much faster and easier.<p>Theoretically, yes: RDBMS is the most efficient way to run complex analytics queries. Practically, we have to handle such issues as asynchronicity, contention, ease of scalability and getting the data into a relational format (which itself is something that could be done using Hadoop: one of the tasks we used it for was transforms all sorts of text data -- crawled webpages, log files -- into CSV data that could be loaded onto the OLAP/OLTP clusters with "LOAD DATA INFILE").