TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Splitgraph - Build and share data with Postgres, inspired by Docker/Git

81 pointsby mildbytealmost 5 years ago

6 comments

chatmastaalmost 5 years ago
Hey HN!<p>I’m Miles, co-founder of Splitgraph along with Artjoms (mildbyte). We met back in 2018, when I reached out to him after reading his blog on HN and realizing we lived right next to each other. Neither of us had a “real job,“ and we both wanted to build something truly innovative and cool. We tossed around a few ideas, but ultimately we couldn’t resist the idea of building “GitHub for data,” which seemed like an obvious gap in the market. After nearly two years of development, we are finally ready — and extremely excited — to share it with the world.<p>We are not the first to notice this gap or try to build this product. So we wanted to make sure we did it right. We made sure to start from “first principles” and really analyze the problem space. We ended up realizing that it’s not strictly Git or GitHub that people want “for data.” Rather, people just want to be able to work with data as easily as they can work with code. They want to experiment, build and maintain data without needless overhead.<p>Tools like Git and Docker are ubiquitous in any software engineer’s workflow, and we took a lot of inspiration from them when designing Splitgraph. We thought about <i>why</i> people like and use these tools, and tried to translate their benefits to the domain of data science. Our core philosophy is to stay out of the way, and work with existing abstractions instead of introducing new ones. You can version your code with Git without switching filesystems. You can build Docker images without changing your code to work in Docker. Our goal with Splitgraph is to provide an easy path to incremental adoption, so you can introduce it into your existing workflows where and when it makes sense.<p>Splitgraph is powered by Postgres, and provides an easy way to build and share versioned datasets, along with a whole bunch of other benefits. We encourage you to read the landing page which (hopefully) explains it well. The documentation goes into much more detail, and if you have ten minutes and Docker installed, you can try Splitgraph for yourself. [0] If you work with data, we really hope you’ll give Splitgraph a try.<p>We’re here to answer any questions, and we’ve also created a Discord server [1] to hopefully build a bit of a community around Splitgraph.<p>[0] <a href="https:&#x2F;&#x2F;www.splitgraph.com&#x2F;docs&#x2F;getting-started&#x2F;five-minute-demo" rel="nofollow">https:&#x2F;&#x2F;www.splitgraph.com&#x2F;docs&#x2F;getting-started&#x2F;five-minute-...</a><p>[1] <a href="https:&#x2F;&#x2F;discord.gg&#x2F;eFEFRKm" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;eFEFRKm</a>
ahnickalmost 5 years ago
Personally I think I&#x27;m more drawn to the dotmesh approach (<a href="https:&#x2F;&#x2F;docs.dotmesh.com&#x2F;concepts&#x2F;architecture&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.dotmesh.com&#x2F;concepts&#x2F;architecture&#x2F;</a>), but the one problem data has is as it gets massive it becomes really hard to move it around and I guess that&#x27;s where trying to layer git like workflows on top of it become intractable. It&#x27;s like data has it&#x27;s own gravity and often times it is just easier to bring other things to the data, rather than the other way around. IIRC Bryan Cantrill said something similar about data when Joyent was developing their object storage system Manta (<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=79fvDDPaIoY);" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=79fvDDPaIoY);</a> ergo, perhaps the Splitgraph approach will meet with better success.
ishchekleinalmost 5 years ago
Here is one of the DVC maintainers :) Congrats! It&#x27;s great to see more tools for codifying data in different scenarios.<p>To be honest, since you introduce a new workflow and a few new concepts it&#x27;s not that easy to get the right perspective in 5 minutes (I know the same problems exists with DVC and we&#x27;ve been iterating on docs a lot). Mind a few questions?<p>Do I understand it right, that is mostly focused on tabular data? Kinda git checkout for an SQL table?
评论 #23634815 未加载
philipsalmost 5 years ago
This is so cool!<p>I have been looking around for databases that have any sort of cryptographic digest of data to ensure integrity. And this is the first time I have seen something do that.<p>Could the snapshots and content addressability be used for regular backups of application databases?
评论 #23632323 未加载
评论 #23632287 未加载
zmmmmmalmost 5 years ago
I&#x27;m probably a bit naive about this but could it make it unnecessary to explicitly create database dumps as backups in scenarios where you need a rollback? ie: could I just tag the database and be guaranteed I would later get back that data if, for example, my upgrade failed and I wanted to restore, simply by checking out the tag?
评论 #23634925 未加载
username3almost 5 years ago
How does this compare to Dolt and DoltHub?
评论 #23632625 未加载
评论 #23633348 未加载