TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Spark vs. Snowflake: The Cloud Data Engineering (ETL) Debate

15 pointsby ibainsalmost 5 years ago

2 comments

dalailambdaalmost 5 years ago
A quote from the article I would object to is &quot;for large datasets and complex transformations this architecture is far from ideal. This is far from the world of open-source code on Git &amp; CI&#x2F;CD that data engineering offers - again locking you into proprietary formats, and archaic development processes.&quot;<p>No one is forcing you to use those tools on top of something like Snowflake (which is just a SQL interface). These days we have great open source tools (such as <a href="https:&#x2F;&#x2F;www.getdbt.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.getdbt.com&#x2F;</a>) which let you write plain SQL that you can then deploy to multiple environments, perform automated testing and deployment, and do fun scripting. At the same time, dealing with large datasets in a spark world is full of lower level details, whereas in a SQL database it&#x27;s the exact same query you would run on a smaller dataset.<p>The reality is that the ETL model is fading in favour of ELT (load data then transform it in the warehouse) because maintaining complex data pipelines and spark clusters make little sense when you can spin up a cloud data warehouse. In this world we don&#x27;t just need less developer time, those developers don&#x27;t have to be engineers that can write and maintain spark workloads&#x2F;clusters, they can be analysts who are able to do transformations and have something valuable out to the business faster than the equivalent spark data pipeline can be built.
评论 #23841123 未加载
ibainsalmost 5 years ago
Would love to get perspectives from HN community - did you decide between Snowflake and Spark for data engineering? Which one did you pick, and why?
评论 #23873077 未加载