TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

SQLite: 35% Faster Than the Filesystem

487 点作者 yla9210 个月前

22 条评论

tgtweak10 个月前
No file system attributes or metadata on records which also means no (xattrs&#x2F;fattrs) being written or updated, no checks to see if it&#x27;s a physical file or a pipe&#x2F;symlink, no permission checks, no block size alignment mismatches, single open command.<p>Makes sense when you consider you&#x27;re throwing out functionality and disregarding general purpose design.<p>If you use a fuse mapping to SQLite, mount that directory and access it, you&#x27;d probably be very similar performance (perhaps even slower) and storage use as you&#x27;d need to add additional columns in the table to track these attributes.<p>I have no doubt that you could create a custom tuned file system on a dedicated mount with attributes disabled, minimized file table and correct&#x2F;optimized block size and get very near to this perf.<p>Let&#x27;s not forget the simplicity of being able to use shell commands (like rsync) to browse and manipulate those files without running the application or an SQL client to debug.<p>Makes sense for developers to use SQLite for this use case though for an appliance-type application or for packaged static assets (this is already commonplace in game development - a cab file is essentially the same concept)
评论 #41089165 未加载
评论 #41086772 未加载
评论 #41091676 未加载
lc6410 个月前
That&#x27;s a very rigorously written article.<p>Let&#x27;s also note the 4x speed increase on windows 10, once again underlining just how slow windows filesystem calls are, when compared to direct access, and other (kernel, filesystem) combinations.
评论 #41085767 未加载
评论 #41086909 未加载
评论 #41086799 未加载
评论 #41086577 未加载
评论 #41087980 未加载
robertclaus10 个月前
I did some research in a database research lab, and we had a lot of colleagues working on OS research. It was always interesting to compare the constraints and assumptions across the two systems. I remember one of the big differences was the scale of individual records we expected to be working with, which in turn affected how memory and disk was managed. Most relational databases are very much optimized for small individual records and eventual consistency, which allows them to cache a lot more in memory. On the other hand, performance often drops sharply with the size of your rows.
igammarays10 个月前
This is precisely why I&#x27;m considering appending to a sqlite DB in WAL2 mode instead of plain text log files. Almost no performance penalty for writes but huge advantages for reading&#x2F;analysis. No more Grafana needed.
评论 #41086070 未加载
评论 #41086317 未加载
评论 #41085883 未加载
评论 #41086086 未加载
评论 #41093181 未加载
评论 #41085866 未加载
评论 #41086208 未加载
评论 #41086164 未加载
freedmand10 个月前
I recently had the idea to record every note coming out of my digital piano in real-time. That way if I come up with a good idea when noodling around I don’t have to hope I can remember it later.<p>I was debating what storage layer to use and decided to try SQLite because of its speed claims — essentially a single table where each row is a MIDI event from the piano (note on, note off, control pedal, velocity, timestamp). No transactions, just raw inserts on every possible event. It so far has worked beautifully: it’s performant AND I can do fun analysis later on, e.g. to see what keys I hit more than others or what my average note velocity is.
评论 #41086811 未加载
leni53610 个月前
&gt; The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database<p>I wonder how io_uring compares.
评论 #41086061 未加载
评论 #41085877 未加载
Upvoter3310 个月前
When something built on top of the filesystem is &quot;faster&quot; than the filesystem, it just means &quot;when you use the filesystem in a less-than-optimal manner, it will be slower than an app that uses it in a sophisticated manner.&quot; An interesting point, but perhaps obvious...
评论 #41086729 未加载
Kalanos10 个月前
TLDR; don&#x27;t do it.<p>I&#x27;ve used SQLite blob fields for storing files extensively.<p>Note that there is a 2GB blob maximum: <a href="https:&#x2F;&#x2F;www.sqlite.org&#x2F;limits.html" rel="nofollow">https:&#x2F;&#x2F;www.sqlite.org&#x2F;limits.html</a><p>To read&#x2F;write blobs, you have to serialize&#x2F;deserialize your objects to bytes. This process is not only tedious, but also varies for different objects and it&#x27;s not a first-class citizen in other tools, so serialization kept breaking as my dependencies upgraded.<p>As my app matured, I found that I often wanted hierarchical folder-like functionality. Rather than recreating this mess in db relationships, it was easier to store the path and other folder-level metadata in sqlite so that I could work with it in Python. E.g. `os.listdir(my_folder)`.<p>Also, if you want to interact with other systems&#x2F;services, then you need files. sqlite can&#x27;t be read over NFS (e.g. AWS EFS) and <i>by design</i> it has no server for requests. so i found myself caching files to disk for export&#x2F;import.<p>SQLite has some settings for handling parallel requests from multiple services, but when I experimented with them I always wound up with a locked db due to competing requests.<p>For one reason or another, you will end up with hybrid (blob&#x2F;file) ways of persisting data.
评论 #41086416 未加载
评论 #41086018 未加载
评论 #41085911 未加载
评论 #41086288 未加载
评论 #41085880 未加载
评论 #41087452 未加载
评论 #41086717 未加载
评论 #41086044 未加载
评论 #41085828 未加载
throwaway21110 个月前
I was looking at self hosted RSS readers recently. The application is single user. I don&#x27;t expect it to be doing a lot of DB intensive stuff.<p>It surprised me that almost all required PostgreSQL, and most of those that didn&#x27;t opted for something otherwise complex such as Mongo or MySQL.<p>SQLite, with no dependencies, would have simplified the process no end.
评论 #41086091 未加载
theGeatZhopa10 个月前
Depends, depends.. but just of logic:<p>All fs&#x2F;drive access is managed by the OS. No DB systems have raw access to sectors or direct raw access to files.<p>Having a database file on the disc, offers a &quot;cluster&quot; of successive blocks on the hard drive (if it&#x27;s not fragmented), resulting in relatively short moving distances of the drive head to seek the necessary sectors. There will still be the same sectors occupied, even after vast insert&#x2F;write&#x2F;del operations. Absolutely no change of DB file&#x27;s position on hard drive. It&#x27;s not a problem with SSDs, though.<p>So, the following apply:<p>client -&gt; DB -&gt; OS -&gt; Filesystem<p>I think, you already can see the DB part is an extra layer. So, if one wouldn&#x27;t have this, it would be &quot;faster&quot; in terms of execution time. Always.<p>If it&#x27;s slower, then you use the not-optimal settings for your use case&#x2F;filesystem.<p>My father did this once. He took H2 and made it even more faster :) incredible fast on Windows in direct comparison of H2&#x2F;h2-modificated with same data.<p>So having a DBMS is convenient and made in decisions to serve certain domains and their problems. Having it is convenient, but that doesn&#x27;t mean it&#x27;s the most optimized way of doing it.
评论 #41085933 未加载
评论 #41086081 未加载
RaiausderDose10 个月前
numbers are from 2017, update would be cool
aydgn10 个月前
This discussion never gets old.<p>* <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=14550060">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=14550060</a> 7 years ago<p>* <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20729930">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20729930</a> 5 years ago<p>* <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27137834">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27137834</a> 3 years ago<p>* <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27897427">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27897427</a> 3 years ago<p>* <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34387407">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34387407</a> 2 years ago
OttoCoddo10 个月前
SQLite can be faster than FileSystem for small files. For big files, it can do more than 1 GB&#x2F;s. On Pack [1], I benchmarked these speeds, and you can go very fast. It can be even 2X faster than tar [2].<p>In my opinion, SQLite can be faster in big reads and writes too, but the team didn&#x27;t optimise it as much (like loading the whole content into memory) as maybe it was not the main use of the project. My hope is that we will see even faster speeds in the future.<p>[1] <a href="https:&#x2F;&#x2F;pack.ac" rel="nofollow">https:&#x2F;&#x2F;pack.ac</a> [2] <a href="https:&#x2F;&#x2F;forum.lazarus.freepascal.org&#x2F;index.php&#x2F;topic,66281.msg509173.html#msg509173" rel="nofollow">https:&#x2F;&#x2F;forum.lazarus.freepascal.org&#x2F;index.php&#x2F;topic,66281.m...</a>
jstummbillig10 个月前
Let&#x27;s assume that filesystems are fairly optimized pieces of software. Let&#x27;s assume that the people building them heard of databases and at some point along the way considered things like the costs of open&#x2F;close calls.<p>What is SQLite not doing that filesystems are?
评论 #41086558 未加载
cedws10 个月前
How much more performance could you get by bypassing the filesystem and writing directly to the block device? Of course, you&#x27;d need to effectively implement your own filesystem, but you&#x27;d be able to optimise it more for the specific workload.
评论 #41087993 未加载
throwaway21110 个月前
i.e. opening and closing many files from disk is slower than opening and closing one file and using memory.<p>It&#x27;s important. But understandable.
me551ah10 个月前
Why hasn’t someone made sqlitefs yet?
评论 #41085529 未加载
评论 #41085749 未加载
评论 #41085545 未加载
评论 #41085559 未加载
评论 #41085744 未加载
评论 #41085627 未加载
评论 #41085602 未加载
评论 #41085587 未加载
评论 #41085554 未加载
评论 #41085523 未加载
评论 #41085573 未加载
alberth10 个月前
Slight OT: does this apply to SQLite on OpenBSD?<p>Because with OpenBSD introduction of pinning all syscalls to libc, doesn’t this block SQLite from making syscall direct.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38579913">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38579913</a>
vagab0nd10 个月前
Reminds me of this talk: <a href="https:&#x2F;&#x2F;www.destroyallsoftware.com&#x2F;talks&#x2F;the-birth-and-death-of-javascript" rel="nofollow">https:&#x2F;&#x2F;www.destroyallsoftware.com&#x2F;talks&#x2F;the-birth-and-death...</a>
throwaway8152310 个月前
Deleting a lot of rows from an sqlite database can be awfully slow, compared with deleting a file.
efilife10 个月前
&gt; Reading is about an order of magnitude faster than writing<p>not a native speaker, what does it mean?
评论 #41086499 未加载
评论 #41086497 未加载
评论 #41086514 未加载
评论 #41086500 未加载
评论 #41086495 未加载
评论 #41086508 未加载
评论 #41086494 未加载
The_Colonel10 个月前
* for certain operations.<p>Which is a bit d&#x27;oh, since being faster for some things is one of the main motivations for a database in the first place.