TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Tracking SQLite Database Changes in Git

347 pointsby vldrover 1 year ago

29 comments

simonwover 1 year ago
This approach works by storing the actual SQLite binary files in Git and then using a custom &quot;diff&quot; configuration to dump each file as SQL and compare the result.<p>It&#x27;s a neat trick, but storing binary files like that in Git isn&#x27;t as space efficient as using a plain text format.<p>I built my own tooling to solve this problem: <a href="https:&#x2F;&#x2F;datasette.io&#x2F;tools&#x2F;sqlite-diffable" rel="nofollow noreferrer">https:&#x2F;&#x2F;datasette.io&#x2F;tools&#x2F;sqlite-diffable</a> - which outputs a “diffable” copy of the data in a SQLite database, precisely so you can store it in Git and look at the differences later.<p>I’ve been running that for a couple of years in this repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;simonw&#x2F;simonwillisonblog-backup">https:&#x2F;&#x2F;github.com&#x2F;simonw&#x2F;simonwillisonblog-backup</a> - which provides a backup of my blog’s PostgreSQL Django database (first converted to SQLite and then dumped out using sqlite-diffable).<p>Here’s an example diff: <a href="https:&#x2F;&#x2F;github.com&#x2F;simonw&#x2F;simonwillisonblog-backup&#x2F;commit&#x2F;72e73b2cdd714fb1f3cd87d6a752971fc6398890">https:&#x2F;&#x2F;github.com&#x2F;simonw&#x2F;simonwillisonblog-backup&#x2F;commit&#x2F;72...</a>
评论 #38117287 未加载
评论 #38115609 未加载
评论 #38114323 未加载
评论 #38167008 未加载
评论 #38114708 未加载
olvy0over 1 year ago
Repeating myself from several days ago (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37958006">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37958006</a>):<p>4 years ago I wrote an internal tool to serialize records of our hierarchical database to a bunch of hierarchical json files, which are then managed in a normal git repo. Each record is serialized to json, &quot;root&quot; entities in the hierarchy are serialized as to contain all their child entities. This is done as the user works - the database is updated and then the correct json file is read, updated and written back. The current SHA is always updated in the database, so there&#x27;s a way to recognize users worked outside our app with git on the repo, we can detect that and automatically the database based on the diff. There&#x27;s also an opposite sync.<p>This was&#x2F;is a very specific solution to our very specific set of problems. So not applicable to the general problem of &quot;versioning a database&quot;.<p>In fact users now complain that this duality of database &lt;-&gt; repo is confusing and we should ditch the database. Our users are unfortunately not very technical and sometimes require a lot of hand holding.<p>In theory they&#x27;re correct but in practice this would mean ditching our ORM and rewriting one from scratch that will work on top of the json files. While keeping the same API outwards. Which is potentially a lot of work. Maybe that will happen, we haven&#x27;t decided yet.<p>I wrote much more about it here:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25005993">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25005993</a>
评论 #38113948 未加载
redleader55over 1 year ago
The filter[0] attribute with &quot;smudge&quot; and &quot;clean&quot; would be a better solution for storing SQLite into git.<p>[0] - <a href="https:&#x2F;&#x2F;git-scm.com&#x2F;docs&#x2F;gitattributes#_filter" rel="nofollow noreferrer">https:&#x2F;&#x2F;git-scm.com&#x2F;docs&#x2F;gitattributes#_filter</a>
评论 #38111779 未加载
评论 #38115458 未加载
评论 #38111954 未加载
评论 #38111579 未加载
评论 #38112069 未加载
评论 #38111667 未加载
samoxover 1 year ago
Sharing a personal project to display git based CSV :D<p><a href="https:&#x2F;&#x2F;app.data-drift.io&#x2F;41231518&#x2F;samox&#x2F;local-datadrift-repo&#x2F;overview?snapshotDate=2023-10-25&amp;commitSha=105c05f0d9b418cf86e223d59cc4a686b298935f" rel="nofollow noreferrer">https:&#x2F;&#x2F;app.data-drift.io&#x2F;41231518&#x2F;samox&#x2F;local-datadrift-rep...</a><p>Not sure how it could be plugged on this stuff, but sharing anyway :D
评论 #38112598 未加载
danirodover 1 year ago
I used to do this trick a couple of years ago to diff the credentials file in Rails codebases, which is encrypted, to keep track of changes to passwords and tokens.<p>Worth to keep in mind that the text representation is virtual and thus the command is run on every operation like a git-diff or a git-show. Therefore, if it takes a while to convert to text, it will also slowdown Git and other tools depending on it.
评论 #38111478 未加载
bambaxover 1 year ago
This is an excellent and very clever trick; however what it does is store the sql database in binary format, and use the dump only for the diff.<p>Would it not be simpler to only store the text dump each time? (The dump can be used trivially to rebuild the db should the need arise, and it&#x27;s human readable).<p>So maybe a pre-commit or pre-staging hook would be even better for this use case?
评论 #38111516 未加载
Hakkinover 1 year ago
Alternatively: <a href="https:&#x2F;&#x2F;www.sqlite.org&#x2F;sqldiff.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.sqlite.org&#x2F;sqldiff.html</a>
评论 #38110828 未加载
michaelmiorover 1 year ago
It&#x27;s rather annoying that part of the config is stored in your local checked out copy of the repository (.git&#x2F;config) and part of it is in a configuration file that is checked into version control (.gitattributes).
评论 #38111868 未加载
评论 #38111730 未加载
tomashubelbauerover 1 year ago
I was also exploring something like this a few years back but for Office files. This exact approach seemed like an absolute win to me, but I ended up not using it, because this won&#x27;t work in the GitHub web UI. This won&#x27;t be a deal-breaker to many, but people should be aware of it still. In the end I ended up doing this: <a href="https:&#x2F;&#x2F;github.com&#x2F;TomasHubelbauer&#x2F;modern-office-git-diff&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;TomasHubelbauer&#x2F;modern-office-git-diff&#x2F;</a>
评论 #38111286 未加载
评论 #38111462 未加载
hiAndrewQuinnover 1 year ago
I had no idea you could do this. Phenomenal!
mrpf1sterover 1 year ago
It would be nice if we finally could have a database that we could use git on to track changes.<p>Dolt looks very promising in this area but only works with MySQL, would be nice to have a SQLite solution.<p><a href="https:&#x2F;&#x2F;www.dolthub.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.dolthub.com&#x2F;</a>
sesmover 1 year ago
I wonder if Fossil (SCM used to develop SQLite and written by SQLite devs) has a similar feature built-in
foobarbecueover 1 year ago
The tricky thing about keeping files in git that don&#x27;t diff well (but can be converted into a representation that does) for me has always been lack of support for filter or hooks in the GitHub web interface. This is a frequent problem for me at work with big xml files.
评论 #38112107 未加载
bspammerover 1 year ago
You can also use this trick to diff encrypted files, if you have an automatic way of decrypting them
emaddaover 1 year ago
You can also convert the data to JSON using sqlite_to_json<p><a href="https:&#x2F;&#x2F;github.com&#x2F;emadda&#x2F;transform-x#clis">https:&#x2F;&#x2F;github.com&#x2F;emadda&#x2F;transform-x#clis</a>
trashburgerover 1 year ago
The really neat part about this is that it still uses the original SQLite files, so it will work retroactively once you add the configuration to your repo. Great article!
smilingemojiover 1 year ago
I love these sort of articles. No new tools are needed here, it shows you how to get the most out of the tools you already know and use. Awesome!
devnonymousover 1 year ago
Ooh, that&#x27;s a neat trick! The head fake, so to speak is this is a git trick rather than anything special to sqlite. Thanks for sharing.
nedtover 1 year ago
But it&#x27;s not really a diff no the database itself, just a diff of a full dump that you can use to rebuild the db, but not change an existing one. For example when you do a DELETE the diff does instead have an INSERT less in the dump, which is not exactly a database diff. Depending on the use case that might still be ok.
skadamatover 1 year ago
This is a super neat trick! At work, we&#x27;re scaling git to terabyte scale and I&#x27;m constantly on the search for new ways we can showcase diffs for different file types:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38112345">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38112345</a>
upon_drumheadover 1 year ago
That’s a really cool trick! I’m going to have to see if I can apply it to other things at work as well. Maybe a image to ascii and then a diff would be useful…
评论 #38110888 未加载
kackielover 1 year ago
Oh wow. This is indeed a very neat use of git config!<p>Although just storing the text output would probably be nicer? You could probably achieve similar with some git hooks?
chrismorganover 1 year ago
Because it’s fun, here’s an alternative spelling of `echo .dump | sqlite3`:<p><pre><code> &lt;&lt;&lt;.dump sqlite3 </code></pre> This avoids invoking echo unnecessarily. Every time you do it you might just save a picowatt-hour or two! (It <i>shouldn’t</i> be more expensive. But be sure to spend a few watt-hours benchmarking it rigorously.)<p>This uses two things that shell users are commonly unfamiliar with:<p>• &lt;&lt;&lt;: as `&gt; filename` redirects stdout to the named file and `&lt; filename` pipes the named file into stdin, `&lt;&lt;&lt; word` pipes the word into stdin. (The whitespace is optional, and word follows normal argument splitting so you can do things like `&lt;&lt;&lt;&quot;Hello, world!&quot;`.)<p>• Ordering: although most commonly written at the end of commands, redirections can be placed at the start as well. So if you do things like `2&gt;&#x2F;dev&#x2F;null &gt;&amp;2` to suppress all output¹, that bit can go at the start or the end. I’ve become increasingly partial to putting redirections at the start of the line, especially in cases where I’m appending to a file for a while, because a leading `&gt;&gt;filename` supports visual alignment better than a trailing.²<p>And since we’re talking about different ways: as given, this is feeding the SQL to sqlite3 via stdin, but you can also pass it on the command line, after the filename. But things like this in Git (aliases, diff textconv, probably more) work by taking the string and appending the filename to the end, so you need a way of reordering the arguments. The solution is an immediately-invoked function:<p><pre><code> f() { sqlite3 &quot;$@&quot; .dump; }; f </code></pre> So when you diff mydb.sqlite, it runs `sh -c &#x27;f() { sqlite3 &quot;$@&quot; .dump }; f mydb.sqlite&#x27;` or equivalent, which winds up executing `sqlite3 mydb.sqlite .dump`, as desired.<p>I use this technique a number of times in my Git aliases, saving the bother of putting them in separate shell scripts somewhere where path management is a bother, at the cost of maintaining a one-liner with sometimes too many semicolons.³<p>—⁂—<p>¹ “Take stderr (2) and redirect it (&gt;) to &#x2F;dev&#x2F;null, then take stdout (default&#x2F;implicit, could also write 1 explicitly) and redirect it (&gt;) to stderr (&amp;2).” There are plenty of other ways of writing this!<p>² Lists can be a better solution for this specific case, allowing you to redirect to the file only once for a whole bunch of commands:<p><pre><code> { &lt;&lt;&lt;&quot;Line one&quot; some-command &lt;&lt;&lt;&quot;End of $thing&quot; } &gt; filename </code></pre> ³ My longest is thirteen lines, though half of them barely count as <i>lines</i>. The line from my ~&#x2F;.config&#x2F;git&#x2F;config, within [alias]:<p><pre><code> # Revise into the commit that last changed File rf = &quot;!f() { if [ $# -eq 0 ]; then REV=\&quot;$(git status --porcelain --untracked-files=no | sed &#x27;&#x2F;^ &#x2F;d;s&#x2F;^.. &#x2F;&#x2F;&#x27; | xargs -n1 git rev-list -1 HEAD -- | uniq)\&quot;; NUM_REVS=\&quot;$(echo \&quot;$REV\&quot; | wc -l)\&quot;; if [ $NUM_REVS -ne 1 ]; then &gt;&amp;2 echo Files in the index were not all last modified in the same commit; exit 1; fi; else REV=\&quot;$(git rev-list -1 HEAD -- \&quot;$1\&quot;)\&quot;; shift; fi; git revise \&quot;$REV\&quot; \&quot;$@\&quot;; }; f&quot;</code></pre>
评论 #38113258 未加载
评论 #38112174 未加载
6LLvveMx2koXfwnover 1 year ago
Are people really adding their database dumps to git? Does that scale? Without this trick, what was the point?
评论 #38111039 未加载
评论 #38111317 未加载
评论 #38112345 未加载
评论 #38111128 未加载
setheronover 1 year ago
Is the dump order preserving ? It technically doesn&#x27;t have to be....
z29LiTp5qUC30nover 1 year ago
does no one know about _A tables anymore? This has been a solved thing since the 1970s
评论 #38112503 未加载
lfconsultover 1 year ago
Awesome...! Thanks for sharing!
vasergenover 1 year ago
is similar approach possible for postgress and MySQL?
评论 #38114248 未加载
0x073over 1 year ago
Is this only visual?