This gets posted a lot here and I really wonder if anyone's bothered to try and replicate these results or test them with different sizes of data on different filesystems - You can see that the greatest disparity is with NTFS/windows while linux is pretty close in performance, but they don't bother to mention if they've got ubuntu formatted for ext4 or whatever. I can't really seem to find anything looking around, and this article is like 5 years old now. I remember showing this very article to a supervisor at one point and he scoffed it off as unnecessary overhead when all I needed was a plain file store and no relational queries at all.<p>It looks like this is the benchmarking code, I'll have to go over it for curiosity sometime: <a href="https://www.sqlite.org/src/file/test/kvtest.c" rel="nofollow">https://www.sqlite.org/src/file/test/kvtest.c</a>
Is it a known issue that the filesystem on Windows 10 is so slow? Being 5 times slower than macOS was roughly my experience but I thought there was just something wrong with my Windows laptop. I can't find any benchmark or explanation about this.
Somewhat related, I wrote a fuse-based file system in Rust recently that used SQLite as the backing store for file records, though not the file contents. I imagine I could use it for file content as well, so it's good to know more about its performance.<p><a href="https://amoffat.github.io/supertag/" rel="nofollow">https://amoffat.github.io/supertag/</a>
No idea what's with the sqlite articles making the front page every other day. This one is pretty old as well.<p><pre><code> The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0.</code></pre>
Hmmm.<p>SQLite is 35% faster reading and writing within a large file than the filesystem is at reading and writing small files.<p>Most filesystems I know are very, very slow at reading and writing small files and much, much faster at reading and writing within large files.<p>For example, for my iOS/macOS performance book[1], I measured the difference writing 1GB of data in files of different sizes, ranging from 100 files of 10MB to 100K files of 10K each.<p>Overall, the times span about an order of magnitude, and even the final step, from individual file sizes of 100KB each to 10KB each was a factor of 3-4 different.<p>[1] <a href="https://www.amazon.com/gp/product/0321842847" rel="nofollow">https://www.amazon.com/gp/product/0321842847</a>
Past related threads:<p><i>35% Faster Than The Filesystem (2017)</i> - <a href="https://news.ycombinator.com/item?id=20729930" rel="nofollow">https://news.ycombinator.com/item?id=20729930</a> - Aug 2019 (164 comments)<p><i>SQLite small blob storage: 35% Faster Than the Filesystem</i> - <a href="https://news.ycombinator.com/item?id=14550060" rel="nofollow">https://news.ycombinator.com/item?id=14550060</a> - June 2017 (202 comments)
There's a great Software Engineering question here, and that is,<p>Should SQLite be modified such that its code, when compiled with a specific #define compilation flag set, be able to act as a <i>drop-in filesystem replacement</i> source code -- for an OS?<p>?<p>I wonder how hard it would be to retool SQLite/Linux -- to be able to accomplish that -- and what would be learned (like, what kind of API would exist between Linux/other OS and SQLite) -- if that were attempted?<p>Yes, there would be some interesting problems to solve, such as how would SQLite write to its database file -- if there is no filesystem/filesystem API underneath it?<p>Also -- one other feature I'd like -- and that is the ability to do strong logging of up to everything that the filesystem does (if the appropriate compilation switches and/or runtime flags are set) -- but somewhere else on the filesystem!<p>So maybe two instances of SQLite running at the same time, one acting as the filesystem and the other acting as the second, logging filesystem -- for that first filesystem...
As much as I love sqlite, if I understand correctly, this is telling that sqlite doing fread on an already opened file is 35% faster than not-sqlite doing fopen+fread on a single file.<p>Is this just a blatantly dishonest comparison, or did I miss something?
SQLite also performs well when there is a large number of "files". A simulation I wrote used a large number of files (10k+). Eventually I had to transition away from using the file system as a key-value store because open, reads, writes, and directory listings were slow when you have that many files in a single directory and that many open file descriptors.
I would love for directories of small files to be stored this way under ZFS; like infrequently updated tar or zip (no compression) files. (Though the filesystem layer of compression might operate on the whole file, or maybe as two streams, one for the file and one for the index.)
It's surprising how fast you can get DETS (the persistent storage version of Elixir's in memory kv-store ETS) to act as a database of sorts. Even on a relatively slow SSD.
I wrote an ORM to use SQLite for serializing/persisting objects to disk (i.e. using SQLite DB as a file format).<p>This was one of the reasons why SQLite was an easy choice.