TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Petabyte-Scale Data Pipelines with Docker, Luigi and Elastic Spot Instances

51 pointsby vtuulosover 9 years ago

3 comments

ngouldover 9 years ago
Cool stuff. Curious to know what systems these containerized tasks are pulling data from. Does adroll consider let those containers access production database instances? Or are they backed by non-production systems? (EDIT: I understand that it's S3 for intermediate steps. But I'm curious where the data comes from initially.)
评论 #10263241 未加载
samkoneover 9 years ago
Interesting, we're doing similar things with Spark, Cassandra and Mesos. By scaling out Mesos with spot instances for its agents.
dkroyover 9 years ago
What are you using for scheduling your jobs?
评论 #10266593 未加载