TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bauplan – Git-for-data pipelines on object storage

83 pointsby barabbababoonabout 1 month ago

10 comments

jtagliabuetoosoabout 1 month ago
Looking to get feedback for a code-first platform for data: instead of custom frameworks, GUIs, notebooks on a chron, bauplan runs SQL &#x2F; Python functions from your IDE, in the cloud, backed by your object storage. Everything is versioned and composable: time-travel, git-like branches, scriptable meta-logic.<p>Perhaps surprisingly, we decided to co-design the abstractions and the runtime, which allowed novel optimizations at the intersection of FaaS and data - e.g. rebuilding functions can be 15x faster than the corresponding AWS stack (<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2410.17465" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2410.17465</a>). All capabilities are available to humans (CLI) and machines (SDK) through simple APIs.<p>Would love to hear the community’s thoughts on moving data engineering workflows closer to software abstractions: tables, functions, branches, CI&#x2F;CD etc.
评论 #43706879 未加载
评论 #43706911 未加载
评论 #43709181 未加载
评论 #43708836 未加载
评论 #43706776 未加载
评论 #43706719 未加载
burembaabout 1 month ago
Looks interesting! Bauplan seems like a mix of an orchestration engine and a data warehouse. It&#x27;s similar to Motherduck as it runs DuckDB on managed EC2, with more data engineer-focused branching and Python support similar to SQLMesh.<p>It&#x27;s interesting that most vendors compute in their own managed account instead of BYOC though. I understand it&#x27;s hard to manage compute on the customer cloud for vendors, but I was under the impression that it&#x27;s a no-go for most enterprise companies. Maybe I&#x27;m wrong?
评论 #43711484 未加载
tech_kenabout 1 month ago
The Git-like approach to data versioning seems <i>really</i> promising to me, but I&#x27;m wondering what those merge operations are expected to look like in practice. In a coding environment, I&#x27;d review the PR basically line-by-line to check for code quality, engineering soundness, etc. But in the data case it&#x27;s not clear to me that a line-by-line review would be possible, or even useful; and I&#x27;m also curious about what (if any) tooling is provided to support it?<p>For example: I saw the YouTube video demo someone linked here where they had an example of a quarterly report pipeline. Say that I&#x27;m one of two analysts tasked with producing that report, and my coworker would like to land a bunch of changes. Say in their data branch, the topline report numbers are different from `main` by X%. Clearly it&#x27;s due to <i>some</i> change in the pipeline, but it seems like I will still have to fire up a notebook and copy+paste chunks of the pipeline to see step-by-step where things are different. Is there another recommended workflow (or even better: provided tooling) for determining which deltas in the pipeline contributed to the X% difference?
评论 #43708795 未加载
whinvikabout 1 month ago
How do you compare with DVC and LakeFS?
评论 #43711539 未加载
russellthehippoabout 1 month ago
Congrats on the more official launch! Super promising, first product that shares dbt-type data organization&#x2F;orchestration capabilities with a compute layer worthy of replacing existing data warehouses&#x2F;python environments.
评论 #43711574 未加载
rustyconoverabout 1 month ago
I&#x27;d love to see a 10 minute YouTube video of the capabilities of this product.
评论 #43711703 未加载
评论 #43706855 未加载
vira28about 1 month ago
For someone like me (who is not an ML expert, but can write Python fluently) Bauplan looks like an ideal fit. Looking forward to taking a deeper look and building something in production.
评论 #43711569 未加载
gigatexalabout 1 month ago
I’m intrigued but what’s the pricing going to be? What am I paying for? Something to make faas easier? What’s the magic behind the scenes?
评论 #43711564 未加载
davistreybigabout 1 month ago
This is first principles where data infrastructure should go in terms of developer ergonomics
redskyluanabout 1 month ago
Amazing, seeking for similar service for years
评论 #43711687 未加载