TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Effective Airflow Development

58 点作者 llambda超过 4 年前

6 条评论

theboat超过 4 年前
Making use of airflow&#x27;s plugin architecture by writing custom hooks and operators is essential for well-maintained, well-developed data pipelines (that use airflow). A little upfront investment in writing components (and the most painful part, writing tests) will go a long way to helping data engineers sleep at night.<p>That said, I make a point of using ETL-as-a-service whenever it&#x27;s available, because there&#x27;s no use solving a problem someone else has solved already.
julee04超过 4 年前
Seeing as this article is from 2019, would people still recommend airflow to ETL data from APIs to DataWarehouse&#x2F;DataLakes or is there something better in the market?
评论 #24206957 未加载
评论 #24206912 未加载
评论 #24206701 未加载
评论 #24206880 未加载
评论 #24207704 未加载
评论 #24210746 未加载
评论 #24209521 未加载
walrus01超过 4 年前
Based on the title I clicked this thinking it was something related to OSI layer 1, for hot aisle&#x2F;cold aisle separation in high density datacenters, compartmentalized-per-cabinet cooling or something.<p>As a side note how do you effectively google for a piece of software or product with a name as generic as &quot;airflow&quot;?
评论 #24206775 未加载
评论 #24209020 未加载
shankysingh超过 4 年前
Testing Pipeline in Airflow is bit pain, but great expectation makes runtime pipeline validation much easier. <a href="https:&#x2F;&#x2F;greatexpectations.io" rel="nofollow">https:&#x2F;&#x2F;greatexpectations.io</a><p>For unit&#x2F;integration tests we ended up doing lot of Docker in Docker setup.
james_woods超过 4 年前
I am using Apache Airflow since a couple of years now and the biggest improvement was the addition of the Kubernetes Operator. You basically keep a vanilla Airflow installation and the custom code is encapsulated and tested in containers. This simplifies it a lot.
评论 #24215215 未加载
rllin超过 4 年前
you only need two operators, the kubeoperator, the dataflow operator<p>and then every task is just a standalone, vertically scalable service on k8s or a giant horizontally scalable compute job