Now you can use Airbyte source connectors to process data in memory with Python.<p>We integrated Airbyte connectors with Pathway, a Python stream processing framework, using the airbyte-serverless project. We believe ETL pipelines are coming back with many use cases in AI (RAG pipelines), ETL for unstructured data and pipelines that deal with PII data. In this article, we show how to stream data from Github using Airbyte and remove PII data with Pathway. We are curious on your feedback on the implementation and other use cases you may think of from decoupling the extract and load steps.
Interesting implementation! For complex stream and text processing, I also prefer processing data in memory with Python (ETL) rather than SQL in the warehouse (ELT).