I'm generally supportive of SQLite's NIH syndrome---which is normally bad, but it can work if the trade-off is well researched and the resulting product is of high quality---but this one is not. Specifically sqlar is a <i>worse</i> replacement of ZIP.<p>It lacks pretty much every feature of modern compressed archive formats: filesystem and custom metadata besides from simple st_mode, solid compression, metadata compression, encryption and integrity check and so on. Therefore it can only be legitimately compared with ZIP, which does support custom metadata, very bad encryption and partial integrity check (via zlib) and only lacks the guaranteed encoding for file names. Even ignoring other formats it is not without a problem: for example the compression mode (DEFLATE vs. uncompressed) is implicitly indicated by `sz = length(data)` and I don't think it is a good idea. If I were designing sqlar and didn't want to spare an additional field I would have instead set sz to something negative so that it never collides with the compressed case (of course, if I had a chance I would just add a separate field instead). Pretty disappointing given other tools from the SQLite ecosystem.
Given how crazy the zip file format is, and the claim that sqlite is faster than the filesystem for small files (<a href="https://www.sqlite.org/fasterthanfs.html" rel="nofollow">https://www.sqlite.org/fasterthanfs.html</a>) this seems pretty reasonable to me.<p>In particular, development repositories with many many small source files often have horrendously slow copying/deleting behaviour (particularly on windows) even on fast disks. I wonder if sqlite archive files would be a better way to store them.
Would love something like that for storing large image datasets for computer vision. Storing embeddings, predictions and metadata in a contiguous format with compression support, ANN indexing support and SQL would be amazing.
Very cool idea! I'm a bit torn whether the format should concern itself with compression. Seems like a useful general blob container strategy, might be prudent to leave compression to the consumer?
For an honest assessment, difficulty of implementation—and, accordingly, lack of diversity in implementations—should be be listed in the "Disadvantages" section.<p>(It's interesting that applications against censorship are brought up. Difficulty of implementation has consequences here, too. In order to effectively use SQLite as an archive format, the receiving end will need the SQLite software. By comparison, it's pretty trivial to craft a polyglot file that is both plain text and HTML and is self-extracting and assumes no software on the other end except to rely on the ubiquity of commodity web browsers. Always bet on text.)
I wish they would build compression directly into SQLite.
I use SQLite as a log store mostly dumping JSON data in it.
Due to the lack of compression the DB is probably 10 times the size it could be.
Would be interesting if this was expanded to support more compression formats. E.g. zstd. Gzip is quite an old format and zstd is a lot quicker to decompress.
When I read "If the input X is incompressible, then a copy of X is returned", I worry that this is broken. If I archive a file, then extract it from the archive, I can not be sure to obtain the same file. If the file is already compressed at the beginning, it will be decompressed at the end.<p>Maybe I am wrong. I didn't know this tool. My brief review of the documentation leads me to believe that it has an obvious problem.