We have been using this architecture for containerized batch processing for more than a year:<p><a href="http://tech.adroll.com/blog/data/2015/09/22/data-pipelines-docker.html" rel="nofollow">http://tech.adroll.com/blog/data/2015/09/22/data-pipelines-d...</a><p>It is very convenient to be able to define any jobs as Docker containers, using the best language for the task, and define dependencies between their inputs and output explicitly, not being tied to any particular paradigm like MapReduce.<p>I am more than happy to let AWS handle plumbing for this instead of having to maintain and operate our custom solution.
"AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted."<p><i></i>Optimal<i></i> is a pretty strong claim, given that this is an NP-Hard problem. Or do we just toss the word "optimal" around now?
This seems like a step backwards to me in some ways. I'd prefer to see them evolve Lambda to support containers, longer jobs, and better workflows instead. I thought we were moving away from EC2 with its slow provisioning, spot bidding, and per-hour billing.