TechEcho

3 comments

moultanoover 13 years ago

My approach to this is to take the complicated bits of the mapreduce and put them in a separate class. Then I do a combination of two things as appropriate:1. Hook it up to a debug server that fetches from the same datastore as the mapreduce, then test it on some keys that I'm interested in.2. Test it like any other class.The only awkward part of this is abstracting out the output calls, which I usually do by passing in a "handle some data" callback that outputs in the mapreduce and dumps some pretty html in the debug server.The great part about this is that if the mapreduce ends up being something important, you already have the tools to introspect its internals on data you are interested in.

评论 #3058199 未加载

epennover 13 years ago

The Cascalog abstraction layer fixes this issue by separating logic from data, allowing you to play creatively at massive scale.I just checked out Casacalog and I like what I see, although I have yet to try it out myself. Does anyone know of something similar that would work with Scala as well?

mitultiwariover 13 years ago

Nice.Check out another similar clojure library called "MR-Kluj" that you can use to write Hadoop MapReduce jobs in Clojure: <a href="https://github.com/cheddar/mr-kluj" rel="nofollow">https://github.com/cheddar/mr-kluj</a>

Getting Creative with MapReduce

3 comments

Getting Creative with MapReduce

3 comments