51 pointsby vtuulosover 9 years ago

3 comments

ngouldover 9 years ago

Cool stuff. Curious to know what systems these containerized tasks are pulling data from. Does adroll consider let those containers access production database instances? Or are they backed by non-production systems? (EDIT: I understand that it's S3 for intermediate steps. But I'm curious where the data comes from initially.)

评论 #10263241 未加载

samkoneover 9 years ago

Interesting, we're doing similar things with Spark, Cassandra and Mesos. By scaling out Mesos with spot instances for its agents.

dkroyover 9 years ago

What are you using for scheduling your jobs?

评论 #10266593 未加载

Petabyte-Scale Data Pipelines with Docker, Luigi and Elastic Spot Instances

3 comments

Petabyte-Scale Data Pipelines with Docker, Luigi and Elastic Spot Instances

3 comments