Really interesting article. I can't help but notice that he used ruby in his example code. Anyone know if this is simply for illustration purposes or is it already possible to use ruby to write hadoop map/reduce jobs?
We've sometimes used a hack that is a minor variant of the micro-batch processing in MapReduce. We: a) map the latest batch of data; b) in the reducer, join it with a cache from a previous reduction; and c) reduce in part, save a new cache, and proceed with further reductions. (We use our homegrown MapReduce implementation that allows multiple rounds of reduction and access to the filesystem, so I'm not sure this would work in Hadoop.)