A faster way to copy SQLite databases between computers

505 pointsby ingve7 days ago

49 comments

bambax7 days ago

> If it takes a long time to copy a database and it gets updated midway through, rsync may give me an invalid database file. The first half of the file is pre-update, the second half file is post-update, and they don’t match. When I try to open the database locally, I get an errorOf course! You can't copy the file of a running, active db receiving updates, that can only result in corruption.For replicating sqlite databases safely there is<a href="https://github.com/benbjohnson/litestream">https://github.com/benbjohnson/litestream</a>

评论 #43862801 未加载

评论 #43857735 未加载

评论 #43857907 未加载

评论 #43858747 未加载

评论 #43858763 未加载

评论 #43857708 未加载

评论 #43857711 未加载

评论 #43883797 未加载

评论 #43864878 未加载

评论 #43857932 未加载

评论 #43857354 未加载

zeroq7 days ago

How to copy databases between computers? Just send a circle and forget about the rest of the owl.As others have mentioned an incremental rsync would be much faster, but what bothers me the most is that he claims that sending SQL statements is faster than sending database and COMPLETELY omiting the fact that you have to execute these statements. And then run /optimize/. And then run /vacuum/.Currently I have scenario in which I have to "incrementally rebuild *" a database from CSV files. While in my particular case recreating the database from scratch is more optimal - despite heavy optimization it still takes half an hour just to run batch inserts on an empty database in memory, creating indexes, etc.

评论 #43857110 未加载

评论 #43858668 未加载

评论 #43856728 未加载

conradev7 days ago

SQLite has an official tool for this, fwiw: <a href="https://www.sqlite.org/rsync.html" rel="nofollow">https://www.sqlite.org/rsync.html</a>It works at the page level:> The protocol is for the replica to send a cryptographic hash of each of its pages over to the origin side, then the origin sends back the complete content of any page for which the hash does not match.

评论 #43864687 未加载

hundredwatt7 days ago

The recently released sqlite_rsync utility uses a version of the rsync algorithm optimized to work on the internal structure of a SQLite database. It compares the internal data pages efficiently, then only syncs changed or missing pages.Nice tricks in the article, but you can more easily use the builtin utility now :)I blogged about how it works in detail here: <a href="https://nochlin.com/blog/how-the-new-sqlite3_rsync-utility-works" rel="nofollow">https://nochlin.com/blog/how-the-new-sqlite3_rsync-utility-w...</a>

评论 #43862075 未加载

评论 #43857484 未加载

评论 #43860891 未加载

M95D7 days ago

Saving to text file is inefficient. I save sqlite databases using VACUUM INTO, like this:<pre><code> sqlite3 -readonly /path/db.sqlite "VACUUM INTO '/path/backup.sqlite';" </code></pre> From <a href="https://sqlite.org/lang_vacuum.html" rel="nofollow">https://sqlite.org/lang_vacuum.html</a> :<pre><code> The VACUUM command with an INTO clause is an alternative to the backup API for generating backup copies of a live database. The advantage of using VACUUM INTO is that the resulting backup database is minimal in size and hence the amount of filesystem I/O may be reduced.</code></pre>

评论 #43858471 未加载

评论 #43858376 未加载

mromanuk7 days ago

I was surprised that he didn't try to use on the flight compression, provided by rsync:<pre><code> -z, --compress compress file data during the transfer --compress-level=NUM explicitly set compression level </code></pre> Probably it's faster to compress to gzip and later transfer. But it's nice to have the possibility to improve the transfer with a a flag.

评论 #43856566 未加载

评论 #43857000 未加载

评论 #43856871 未加载

评论 #43857277 未加载

评论 #43856929 未加载

simlevesque7 days ago

In DuckDB you can do the same but export to Parquet, this way the data is an order of magnitude smaller than using text-based SQL statements. It's faster to transfer and faster to load.<a href="https://duckdb.org/docs/stable/sql/statements/export.html" rel="nofollow">https://duckdb.org/docs/stable/sql/statements/export.html</a>

评论 #43856594 未加载

RenThraysk7 days ago

SQLite has an session extension, which will track changes to a set of tables and produce a changeset/patchset which can patch previous version of an SQLite database.<a href="https://www.sqlite.org/sessionintro.html" rel="nofollow">https://www.sqlite.org/sessionintro.html</a>

评论 #43858050 未加载

评论 #43857518 未加载

评论 #43862338 未加载

rarrrrrr7 days ago

If you're regularly syncing from an older version to a new version, you can likely optimize further using gzip with "--rsyncable" option. It will reduce the compression by ~1% but make it so differences from one version to the next are localized instead of cascading through the full length of the compression output.Another alternative is to skip compression of the dump output, let rsync calculate the differences from an previous uncompressed dump to the current dump, then have rsync compress the change sets it sends over the network. (rsync -z)

rabysh7 days ago

I think this could be a single pipeline?ssh username@server "sqlite3 my_remote_database.db .dump | gzip -c" | gunzip -c | sqlite3 my_local_database.db

评论 #43856540 未加载

评论 #43860960 未加载

KETpXDDzR2 days ago

<pre><code> ssh username@server "sqlite3 my_remote_database.db .dump | gzip -c" | gunzip -c | sqlite3 my_local_database.db</code></pre>

xnx7 days ago

Great example of Cunningham's Law: <a href="https://en.wikipedia.org/wiki/Ward_Cunningham#:~:text=%22Cunningham's%20Law%22" rel="nofollow">https://en.wikipedia.org/wiki/Ward_Cunningham#:~:text=%22Cun...</a>

nottorp7 days ago

Wait... why would you even think about rsyncing a database that can get changed while being copied?Isn't this a case for proper database servers with replication?Or if it's an infrequent process done for dev purposes just shut down the application doing writes on the other side?

kazinator7 days ago

Does the author not know that rsync can use compression (rsync -z | --compress | --compress-level=<n> ), or does he not think it worthwhile to compare that data point?I just tried some comparisons (albeit with a fairly small sqlite file). The text compressed to only about 84% of the size of the compressed binary database, which isn't negligible, but not necessarily worth fussing over in every situation. (The binary compressed to 7.1%, so it's 84% relative to that).bzip2 performed better on both formats; its compression of the binary database was better than gzip's compression of the text (91.5%) and bzip2's text was better than binary (92.5).Though that is not available inside rsync, it indicates that if you're going with an external compression solution, maybe gzip isn't the best choice if you care about every percentage reduction.If you don't care about every percentage reduction, maybe just rsync compression.One thing worth mentioning is that if you are updating the file, rsync will only compress what is sent. To replicate that with the text solution, you will have to be retaining the text on both sides to do the update between them.

Levitating7 days ago

I am sure you can just pipe all this so you don't have to use an intermediate gunzip file.Just ssh the machine, dump the SQL and load it back into SQLite locally.

评论 #43858093 未加载

评论 #43857006 未加载

forinti7 days ago

One of the coolest things you can do with Postgresql is pipe pg_dump straight into psql connected to another cluster on another host.

评论 #43857859 未加载

Cthulhu_7 days ago

I used to work at a company that had a management interface that used sqlite as database, its multi-node / fallover approach was also just... copying the file and rsyncing it. I did wonder about data integrity though, what if the file is edited while it's being copied over? But there's probably safeguards in place.Anyway I don't think the database file size was really an issue, it was a relatively big schema but not many indices and performance wasn't a big consideration - hence why the backend would concatenate query results into an XML file, then pass it through an xml->json converter, causing 1-2 second response times on most requests. I worked on a rewrite using Go where requests were more like 10-15 milliseconds.But, I still used sqlite because that was actually a pretty good solution for the problem at hand; relatively low concurrency (up to 10 active simultaneous users), no server-side dependencies or installation needed, etc.

评论 #43857072 未加载

评论 #43856599 未加载

ollybee7 days ago

sqlite provide the sqlite3_rsync command to safely copy databases <a href="https://sqlite.org/rsync.html" rel="nofollow">https://sqlite.org/rsync.html</a>

d34th0rl1f37 days ago

You can save time by using `zcat` instead of `cat` and skip the `gunzip my_local_database.db.txt.gz` step.

评论 #43856874 未加载

评论 #43859681 未加载

评论 #43856670 未加载

elmolino896 days ago

Ignoring for the moment issues of syncing a database where the source DB may be running inserts/updates:if one dumps tables as separate CSV files/streams and using DuckDB converts them to individual parquet files the rsync should run faster since hopefully not every table is modified between each new syncing. There is an obvious overhead of the back and forth conversions but DuckDB can directly export a database to SQLite. I have not tested it myself, so it is just a brainstorming.Last but not least: when compressing/decompressing text dumps use igzip or pigz if you want to speed things up. Also benchmark the compression levels.

kylecordes7 days ago

This reminds me of something I did back in 2008, we were using Postgres, not SQLite, and we needed to get a backup from one production machine onto several other machines. Copying saturated the network for an hour - And we needed to copy four times if I remember right. We didn't want to wait four hours.I found this tool laying around "udpcast" and used it to send the backup over the network just once to all the destinations at the same time.<a href="https://kylecordes.com/2008/multicast-your-db-backups-with-udpcast" rel="nofollow">https://kylecordes.com/2008/multicast-your-db-backups-with-u...</a>

feverzsj7 days ago

That makes zero sense. Incremental backup via rsync/sqlite3_rsync should always be faster.

评论 #43857247 未加载

quantadev7 days ago

Since sqlite is just a simple file-level locking DB, I'm pretty shocked they don't have an option to let the indexes be stored in separate files for all kinds of obvious and beneficial reasons, like the fact that you can easily exclude them from backups if they were, and you can make them "rebuild" just by deleting them. Probably their reason for keeping all internal has to do with being sure indexes are never out of sync, but that could just as easily be accomplished with hashing algos.

xk36 days ago

Git works pretty well for syncing binary files. You can even do this:<pre><code> $ cat .gitattributes *.db diff=sqlite3 </code></pre> And you can know how the sqlite files changed between commits--but yeah if you make commits while a process is writing new rows or updating data then you'll have data loss if you make a commit on another machine--better to use something like litestream in that case

limaoscarjuliet7 days ago

In case someone is looking for a proper way of making db copy in PostgreSQL and not SQLLite, pg_basebackup is the proper way: <a href="https://www.postgresql.org/docs/current/app-pgbasebackup.html" rel="nofollow">https://www.postgresql.org/docs/current/app-pgbasebackup.htm...</a>

pnm456787 days ago

It all depends what you’re optimizing for.Clearly, when you don’t transfer indexes, you will not have to transfer as much data.However, the tradeoff is that the database is going to have to do more work to regenerate those indexes when you reconstruct it from the text dump at the destination (as opposed to the case where the indexes were included)

npn7 days ago

???Why not just compress the whole database using `gzip` or `lz4` before rsyncing it instead? `zstd` works too but seems like it had a bug regarding compressing file with modified content.better yet, split your sqlite file to smaller piece. it is not like it needs to contain all the app data in a single sqlite file.

nodesocket7 days ago

I’ve been looking into a way to replicate a SQLite database and came across the LiteFS project by Fly.io. Seems like a solid drop-in solution backed by FUSE and Consul. Anybody used it in production? My use case is high availability between multiple VMs.

actinium2267 days ago

I have recently discovered a tool called mscp which opens open multiple scp threads to copy down large files. It works great for speeding up these sorts of downloads.<a href="https://github.com/upa/mscp">https://github.com/upa/mscp</a>

deepsun7 days ago

"Skip the index and turn it on later" is an old trick from mainframes and DB2.

404mm7 days ago

zstd would be a better choice. It’s bonkers fast (especially when used with multithreading) and still compresses better than gzip. Alternatively, I’d recommend looking into bzip3, but I’m not sure if it would save time.

pvorb7 days ago

How long does this procedure take in comparison to the network transfer?My first try would've been to copy the db file first, gzip it and then transfer it but I can't tell whether compression will be that useful in binary format.

评论 #43856568 未加载

RKFADU_UOFCCLEL7 days ago

Pretty good point. I just wonder if databases in generally can be perfectly reconstructed from a text dump. For instance, do the insertion orders change in any of the operations between dumping and importing?

ozim7 days ago

I guess for me it is obvious you don't try to copy running DB only a backup.So I see basic stuff needs to be repeated as people still miss those kinds of things.But I learned that you can easily dump SQLite to a text file - neat!

markhahn7 days ago

isn't this rather obvious? doesn't everyone do this when it makes sense? obviously, it applies to other DBs, and you don't even need to store the file (just a single ssh from dumper to remote undumper).if retaining the snapshot file is of value, great.I'd be a tiny bit surprised if rsync could recognize diffs in the dump, but it's certainly possible, assuming the dumper is "stable" (probably is because its walking the tables as trees). the amount of change detected by rsync might actually be a useful thing to monitor.

wyager7 days ago

`zfs snapshot` and `zfs send -i` is hard to beat for DB backup.

jasonlingx7 days ago

He could’ve skipped the intermediate steps and made it a one-liner by piping the sql dump over ssh right into the sqlite3 reconstruction command.

yapyap7 days ago

Very neat walkthrough, clear commands and I appreciate the explanations as to why this may help in OPs case

gwbas1c7 days ago

I wonder if there's a way to export to parquet files? They are designed to be extremely compact.

评论 #43865490 未加载

jbverschoor7 days ago

In curious how your indices are twice the data. Sounds like you just put indices in anything you see.

评论 #43857047 未加载

评论 #43856922 未加载

ukuina7 days ago

Doesn't this just push the runtime into index recomputation on the destination database?

评论 #43857097 未加载

iambear7 days ago

I usually use scp for this case, sometimes rsync version is not compatible between 2 machines

cwmma7 days ago

how well does just the sqlite database gzip, the indexes are a lot of redundant data so your going to get some efficiencies there, probably less locality of data then the text file though so maybe less?

baxter0017 days ago

What technologies we have in 2025!

dundundundun7 days ago

This is basically the way every other database is moved around.

isaacvando7 days ago

Nice!

alienbaby7 days ago

I'm surprised sqlite is duplicating data to make indexes? Surely it would just be manipulating groups of pointers?

whalesalad7 days ago

it's a file - what am I missing? scp host:path .

评论 #43858867 未加载

bluefirebrand7 days ago

All this obsession with making processes like this fasterWhen is a guy supposed to get a coffee and stretch his legs anymore?