Dear HN:<p>Let's say there was a thing that gave you the full git workflow (branch, sync, push, pull, merge, revert, etc) efficiently for large-scale structured data.<p>Would such a thing be valuable? Would you use it? Would you pay for it?<p>Asking for a friend.
From my experience in enterprise business intelligence, absolutely yes but more for the tracking rather than the ability to roll back changes. Data lineage is an important concept and is extremely valuable from a compliance perspective. The key would be can your system integrate with existing data sources?<p>You'll also likely be going up against Informatica if you want to play in the Enterprise space. That said, I'm also interested in a solution like yours as we're rolling out our own automation system but need to keep track of things for compliance reasons.<p>Would be happy to have a chat.
What sort of structured data are we talking about?<p>If you format an XML or JSON file with one field per line (or just use YAML) git itself should fit the bill perfectly.<p>Now, I do see lots of room for a git-like tool targeted at specific existing binary file formats. Microsoft Office and Photoshop files come to mind off the top of my head. (I believe that such tools already exist for those particular formats, but they're expensive and currently have low adoption.)
Relevant: GitHub Large File Storage <a href="https://git-lfs.github.com/" rel="nofollow">https://git-lfs.github.com/</a><p>On paying for this. When data operations are built around pipelines, it's often easier to re-run the pipeline or restore a snapshot. Which requires a good server, but not a service. So before paying, I'd check why the new tool is better.
I would use it for storing and tracking specifications if this is a possible usage. In regulatory compliance, there is always a need to know what the specification was at some point in time.<p>However, any system would have to be available on a private internal network for most places.