Our startup is doing some heavy lifting on financial transactions to categorize and warehouse into our BI like star schema.<p>We're rails + postgres and it looks like we'll need to get dirty with MapReduce sooner rather than later. Given that we're on EC2, should we try using Amazon's Elastic MapReduce? Should we just host our own Cloudera instance?
Also, it seems like processing takes place on files in S3 -- what would my workflow look like if I were processing data from a database?