TechEcho

10 comments

Looking to get feedback for a code-first platform for data: instead of custom frameworks, GUIs, notebooks on a chron, bauplan runs SQL / Python functions from your IDE, in the cloud, backed by your object storage. Everything is versioned and composable: time-travel, git-like branches, scriptable meta-logic.Perhaps surprisingly, we decided to co-design the abstractions and the runtime, which allowed novel optimizations at the intersection of FaaS and data - e.g. rebuilding functions can be 15x faster than the corresponding AWS stack (<a href="https://arxiv.org/pdf/2410.17465" rel="nofollow">https://arxiv.org/pdf/2410.17465</a>). All capabilities are available to humans (CLI) and machines (SDK) through simple APIs.Would love to hear the community’s thoughts on moving data engineering workflows closer to software abstractions: tables, functions, branches, CI/CD etc.

评论 #43706879 未加载

评论 #43706911 未加载

评论 #43709181 未加载

评论 #43708836 未加载

评论 #43706776 未加载

评论 #43706719 未加载

burembaabout 1 month ago

Looks interesting! Bauplan seems like a mix of an orchestration engine and a data warehouse. It's similar to Motherduck as it runs DuckDB on managed EC2, with more data engineer-focused branching and Python support similar to SQLMesh.It's interesting that most vendors compute in their own managed account instead of BYOC though. I understand it's hard to manage compute on the customer cloud for vendors, but I was under the impression that it's a no-go for most enterprise companies. Maybe I'm wrong?

评论 #43711484 未加载

tech_kenabout 1 month ago

The Git-like approach to data versioning seems really promising to me, but I'm wondering what those merge operations are expected to look like in practice. In a coding environment, I'd review the PR basically line-by-line to check for code quality, engineering soundness, etc. But in the data case it's not clear to me that a line-by-line review would be possible, or even useful; and I'm also curious about what (if any) tooling is provided to support it?For example: I saw the YouTube video demo someone linked here where they had an example of a quarterly report pipeline. Say that I'm one of two analysts tasked with producing that report, and my coworker would like to land a bunch of changes. Say in their data branch, the topline report numbers are different from `main` by X%. Clearly it's due to some change in the pipeline, but it seems like I will still have to fire up a notebook and copy+paste chunks of the pipeline to see step-by-step where things are different. Is there another recommended workflow (or even better: provided tooling) for determining which deltas in the pipeline contributed to the X% difference?

评论 #43708795 未加载

whinvikabout 1 month ago

How do you compare with DVC and LakeFS?

评论 #43711539 未加载

russellthehippoabout 1 month ago

Congrats on the more official launch! Super promising, first product that shares dbt-type data organization/orchestration capabilities with a compute layer worthy of replacing existing data warehouses/python environments.

评论 #43711574 未加载

rustyconoverabout 1 month ago

I'd love to see a 10 minute YouTube video of the capabilities of this product.

评论 #43711703 未加载

评论 #43706855 未加载

vira28about 1 month ago

For someone like me (who is not an ML expert, but can write Python fluently) Bauplan looks like an ideal fit. Looking forward to taking a deeper look and building something in production.

评论 #43711569 未加载

gigatexalabout 1 month ago

I’m intrigued but what’s the pricing going to be? What am I paying for? Something to make faas easier? What’s the magic behind the scenes?

评论 #43711564 未加载

davistreybigabout 1 month ago

This is first principles where data infrastructure should go in terms of developer ergonomics

redskyluanabout 1 month ago

Amazing, seeking for similar service for years

评论 #43711687 未加载

10 comments

jtagliabuetoosoabout 1 month ago

评论 #43706879 未加载

评论 #43706911 未加载

评论 #43709181 未加载

评论 #43708836 未加载

评论 #43706776 未加载

评论 #43706719 未加载

burembaabout 1 month ago

评论 #43711484 未加载

tech_kenabout 1 month ago

评论 #43708795 未加载

whinvikabout 1 month ago

How do you compare with DVC and LakeFS?

评论 #43711539 未加载

russellthehippoabout 1 month ago

评论 #43711574 未加载

rustyconoverabout 1 month ago

I'd love to see a 10 minute YouTube video of the capabilities of this product.

评论 #43711703 未加载

评论 #43706855 未加载

vira28about 1 month ago

For someone like me (who is not an ML expert, but can write Python fluently) Bauplan looks like an ideal fit. Looking forward to taking a deeper look and building something in production.

评论 #43711569 未加载

gigatexalabout 1 month ago

I’m intrigued but what’s the pricing going to be? What am I paying for? Something to make faas easier? What’s the magic behind the scenes?

评论 #43711564 未加载

davistreybigabout 1 month ago

This is first principles where data infrastructure should go in terms of developer ergonomics

redskyluanabout 1 month ago

Amazing, seeking for similar service for years

评论 #43711687 未加载

Bauplan – Git-for-data pipelines on object storage

10 comments

Bauplan – Git-for-data pipelines on object storage

10 comments