TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Effective Airflow Development

58 pointsby llambdaover 4 years ago

6 comments

theboatover 4 years ago
Making use of airflow&#x27;s plugin architecture by writing custom hooks and operators is essential for well-maintained, well-developed data pipelines (that use airflow). A little upfront investment in writing components (and the most painful part, writing tests) will go a long way to helping data engineers sleep at night.<p>That said, I make a point of using ETL-as-a-service whenever it&#x27;s available, because there&#x27;s no use solving a problem someone else has solved already.
julee04over 4 years ago
Seeing as this article is from 2019, would people still recommend airflow to ETL data from APIs to DataWarehouse&#x2F;DataLakes or is there something better in the market?
评论 #24206957 未加载
评论 #24206912 未加载
评论 #24206701 未加载
评论 #24206880 未加载
评论 #24207704 未加载
评论 #24210746 未加载
评论 #24209521 未加载
walrus01over 4 years ago
Based on the title I clicked this thinking it was something related to OSI layer 1, for hot aisle&#x2F;cold aisle separation in high density datacenters, compartmentalized-per-cabinet cooling or something.<p>As a side note how do you effectively google for a piece of software or product with a name as generic as &quot;airflow&quot;?
评论 #24206775 未加载
评论 #24209020 未加载
shankysinghover 4 years ago
Testing Pipeline in Airflow is bit pain, but great expectation makes runtime pipeline validation much easier. <a href="https:&#x2F;&#x2F;greatexpectations.io" rel="nofollow">https:&#x2F;&#x2F;greatexpectations.io</a><p>For unit&#x2F;integration tests we ended up doing lot of Docker in Docker setup.
james_woodsover 4 years ago
I am using Apache Airflow since a couple of years now and the biggest improvement was the addition of the Kubernetes Operator. You basically keep a vanilla Airflow installation and the custom code is encapsulated and tested in containers. This simplifies it a lot.
评论 #24215215 未加载
rllinover 4 years ago
you only need two operators, the kubeoperator, the dataflow operator<p>and then every task is just a standalone, vertically scalable service on k8s or a giant horizontally scalable compute job