TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Would you use a “git for data”?

14 pointsby sha1-1b141eabout 9 years ago
Dear HN:<p>Let&#x27;s say there was a thing that gave you the full git workflow (branch, sync, push, pull, merge, revert, etc) efficiently for large-scale structured data.<p>Would such a thing be valuable? Would you use it? Would you pay for it?<p>Asking for a friend.

7 comments

tixocloudabout 9 years ago
From my experience in enterprise business intelligence, absolutely yes but more for the tracking rather than the ability to roll back changes. Data lineage is an important concept and is extremely valuable from a compliance perspective. The key would be can your system integrate with existing data sources?<p>You&#x27;ll also likely be going up against Informatica if you want to play in the Enterprise space. That said, I&#x27;m also interested in a solution like yours as we&#x27;re rolling out our own automation system but need to keep track of things for compliance reasons.<p>Would be happy to have a chat.
wtracyabout 9 years ago
What sort of structured data are we talking about?<p>If you format an XML or JSON file with one field per line (or just use YAML) git itself should fit the bill perfectly.<p>Now, I do see lots of room for a git-like tool targeted at specific existing binary file formats. Microsoft Office and Photoshop files come to mind off the top of my head. (I believe that such tools already exist for those particular formats, but they&#x27;re expensive and currently have low adoption.)
评论 #11538031 未加载
anton_tarasenkoabout 9 years ago
Relevant: GitHub Large File Storage <a href="https:&#x2F;&#x2F;git-lfs.github.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;git-lfs.github.com&#x2F;</a><p>On paying for this. When data operations are built around pipelines, it&#x27;s often easier to re-run the pipeline or restore a snapshot. Which requires a good server, but not a service. So before paying, I&#x27;d check why the new tool is better.
评论 #11538073 未加载
tmalyabout 9 years ago
I would use it for storing and tracking specifications if this is a possible usage. In regulatory compliance, there is always a need to know what the specification was at some point in time.<p>However, any system would have to be available on a private internal network for most places.
daveloyallabout 9 years ago
&#x27;dat jawn: git for tabular data&#x27;: <a href="https:&#x2F;&#x2F;github.com&#x2F;CfABrigadePhiladelphia&#x2F;jawn" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;CfABrigadePhiladelphia&#x2F;jawn</a>
dhoganabout 9 years ago
What advantage would this have over using something like a log table? Or a few SQL(or whatever) commands?
评论 #11544844 未加载
giuscriabout 9 years ago
The Dat project?...