TechEcho

6 comments

theboatover 4 years ago

Making use of airflow's plugin architecture by writing custom hooks and operators is essential for well-maintained, well-developed data pipelines (that use airflow). A little upfront investment in writing components (and the most painful part, writing tests) will go a long way to helping data engineers sleep at night.That said, I make a point of using ETL-as-a-service whenever it's available, because there's no use solving a problem someone else has solved already.

julee04over 4 years ago

Seeing as this article is from 2019, would people still recommend airflow to ETL data from APIs to DataWarehouse/DataLakes or is there something better in the market?

评论 #24206957 未加载

评论 #24206912 未加载

评论 #24206701 未加载

评论 #24206880 未加载

评论 #24207704 未加载

评论 #24210746 未加载

评论 #24209521 未加载

walrus01over 4 years ago

Based on the title I clicked this thinking it was something related to OSI layer 1, for hot aisle/cold aisle separation in high density datacenters, compartmentalized-per-cabinet cooling or something.As a side note how do you effectively google for a piece of software or product with a name as generic as "airflow"?

评论 #24206775 未加载

评论 #24209020 未加载

shankysinghover 4 years ago

Testing Pipeline in Airflow is bit pain, but great expectation makes runtime pipeline validation much easier. <a href="https://greatexpectations.io" rel="nofollow">https://greatexpectations.io</a>For unit/integration tests we ended up doing lot of Docker in Docker setup.

james_woodsover 4 years ago

I am using Apache Airflow since a couple of years now and the biggest improvement was the addition of the Kubernetes Operator. You basically keep a vanilla Airflow installation and the custom code is encapsulated and tested in containers. This simplifies it a lot.

评论 #24215215 未加载

rllinover 4 years ago

you only need two operators, the kubeoperator, the dataflow operatorand then every task is just a standalone, vertically scalable service on k8s or a giant horizontally scalable compute job

6 comments

theboatover 4 years ago

julee04over 4 years ago

Seeing as this article is from 2019, would people still recommend airflow to ETL data from APIs to DataWarehouse/DataLakes or is there something better in the market?

评论 #24206957 未加载

评论 #24206912 未加载

评论 #24206701 未加载

评论 #24206880 未加载

评论 #24207704 未加载

评论 #24210746 未加载

评论 #24209521 未加载

walrus01over 4 years ago

评论 #24206775 未加载

评论 #24209020 未加载

shankysinghover 4 years ago

james_woodsover 4 years ago

评论 #24215215 未加载

rllinover 4 years ago

you only need two operators, the kubeoperator, the dataflow operatorand then every task is just a standalone, vertically scalable service on k8s or a giant horizontally scalable compute job

Effective Airflow Development

6 comments

Effective Airflow Development

6 comments