Making use of airflow's plugin architecture by writing custom hooks and operators is essential for well-maintained, well-developed data pipelines (that use airflow). A little upfront investment in writing components (and the most painful part, writing tests) will go a long way to helping data engineers sleep at night.<p>That said, I make a point of using ETL-as-a-service whenever it's available, because there's no use solving a problem someone else has solved already.
Seeing as this article is from 2019, would people still recommend airflow to ETL data from APIs to DataWarehouse/DataLakes or is there something better in the market?
Based on the title I clicked this thinking it was something related to OSI layer 1, for hot aisle/cold aisle separation in high density datacenters, compartmentalized-per-cabinet cooling or something.<p>As a side note how do you effectively google for a piece of software or product with a name as generic as "airflow"?
Testing Pipeline in Airflow is bit pain, but great expectation makes runtime pipeline validation much easier.
<a href="https://greatexpectations.io" rel="nofollow">https://greatexpectations.io</a><p>For unit/integration tests we ended up doing lot of Docker in Docker setup.
I am using Apache Airflow since a couple of years now and the biggest improvement was the addition of the Kubernetes Operator. You basically keep a vanilla Airflow installation and the custom code is encapsulated and tested in containers. This simplifies it a lot.
you only need two operators, the kubeoperator, the dataflow operator<p>and then every task is just a standalone, vertically scalable service on k8s or a giant horizontally scalable compute job