Nice overview. Off topic, but for complicated work flows I find it useful to use a diagramming Tool like Omnigraffle to maintain a one-page overview of data sources (with sample data), each map reduce job and output (with data examples), etc. A small amount of overhead, but worth it to help keep track of everything. As I add/remove jobs and change code, I keep the diagram up to date.