TechEcho

A quote from the article I would object to is "for large datasets and complex transformations this architecture is far from ideal. This is far from the world of open-source code on Git & CI/CD that data engineering offers - again locking you into proprietary formats, and archaic development processes."<p>No one is forcing you to use those tools on top of something like Snowflake (which is just a SQL interface). These days we have great open source tools (such as <a href="https://www.getdbt.com/" rel="nofollow">https://www.getdbt.com/</a>) which let you write plain SQL that you can then deploy to multiple environments, perform automated testing and deployment, and do fun scripting. At the same time, dealing with large datasets in a spark world is full of lower level details, whereas in a SQL database it's the exact same query you would run on a smaller dataset.<p>The reality is that the ETL model is fading in favour of ELT (load data then transform it in the warehouse) because maintaining complex data pipelines and spark clusters make little sense when you can spin up a cloud data warehouse. In this world we don't just need less developer time, those developers don't have to be engineers that can write and maintain spark workloads/clusters, they can be analysts who are able to do transformations and have something valuable out to the business faster than the equivalent spark data pipeline can be built.

Would love to get perspectives from HN community - did you decide between Snowflake and Spark for data engineering? Which one did you pick, and why?

Spark vs. Snowflake: The Cloud Data Engineering (ETL) Debate

2 comments

Spark vs. Snowflake: The Cloud Data Engineering (ETL) Debate

2 comments