TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

SQLite performance tuning: concurrent reads, multiple GBs and 100k SELECTs/s

219 点作者 Maksadbek大约 2 年前

12 条评论

Maksadbek大约 2 年前
I was benchmarking SQLite3 with Python on my MBP M1 and with default settings could achieve 2-4000 inserts per sec. Then applied suggested settings:<p><pre><code> pragma journal_mode = WAL; pragma synchronous = normal; pragma temp_store = memory; pragma mmap_size = 30000000000; </code></pre> As a result the average number inserts per second was 80K
评论 #35553422 未加载
评论 #35551219 未加载
评论 #35548224 未加载
评论 #35570517 未加载
评论 #35550402 未加载
评论 #35551032 未加载
zylepe大约 2 年前
I spent a while optimizing sqlite inserts for planetiler, this is what I came up with:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;onthegomap&#x2F;planetiler&#x2F;blob&#x2F;db0ab02263baaae3038de058c4fb6a1bebd81e3c&#x2F;planetiler-core&#x2F;src&#x2F;main&#x2F;java&#x2F;com&#x2F;onthegomap&#x2F;planetiler&#x2F;mbtiles&#x2F;Mbtiles.java#L151-L183">https:&#x2F;&#x2F;github.com&#x2F;onthegomap&#x2F;planetiler&#x2F;blob&#x2F;db0ab02263baaa...</a><p>It batches inserts into bulk statements and is able to do writes in the 500k+ per second range, and reads are 300-400k&#x2F;s using those settings.
评论 #35554165 未加载
vaughan大约 2 年前
I hate the way so much of programming is just fiddling knobs on some old systems. It always gets to a point where I would prefer to rewrite the whole thing myself with only the functionality I need and to truly understand what is going on under the hood. I just wish these projects were built in a modular way so you can re-assemble parts of them back together easily.
评论 #35553036 未加载
评论 #35553097 未加载
评论 #35553204 未加载
评论 #35555182 未加载
评论 #35553559 未加载
评论 #35554006 未加载
评论 #35554678 未加载
zackees大约 2 年前
<p><pre><code> WAL mode has some issues where depending on the write pattern, the WAL size can grow to infinity, slowing down performance a lot. I think this usually happens when you have lots of writes that lock the table so sqlite never gets to doing wal_autocheckpoint. </code></pre> I believe that WAL2 fixes this:<p><pre><code> Wal2 mode does not have this problem. In wal2 mode, wal files do not grow indefinitely even if the checkpointer never has a chance to finish uninterrupted </code></pre> <a href="https:&#x2F;&#x2F;sqlite.org&#x2F;cgi&#x2F;src&#x2F;doc&#x2F;wal2&#x2F;doc&#x2F;wal2.md" rel="nofollow">https:&#x2F;&#x2F;sqlite.org&#x2F;cgi&#x2F;src&#x2F;doc&#x2F;wal2&#x2F;doc&#x2F;wal2.md</a>
评论 #35552352 未加载
评论 #35552146 未加载
freddw大约 2 年前
It might be helpful to link the docs for some of these configs in the section where you mention them. Some thoughts:<p>Based on <a href="https:&#x2F;&#x2F;www.sqlite.org&#x2F;wal.html" rel="nofollow">https:&#x2F;&#x2F;www.sqlite.org&#x2F;wal.html</a> it seems the WAL index is mmapped as a workaround to some edge cases not relevant to many application developers. They say it shouldn’t matter, but with the larger page sizes you’re using, using the original implementation approach they describe (volatile shared memory) actually might improve performance slightly - do you know if your WAL index ever exceeds 32KiB? Not sure as to the difficulty of resurrecting that old functionality though.<p>Also, this case seems like something that could happen from time to time depending on what you’re doing. Did you encounter it? Case:<p>&gt; When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a second database tries to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.<p>Both the WAL docs and the article mention blocking checkpointing&#x2F;a need for reader gaps to ensure the WAL flushes, or a possibility that WAL files will grow indefinitely. I had some speculation that this was an implementation limitation, and it turns out another comment mentions WAL2 may relax this requirement by using two WAL files split between “hot” and “cold”. Curious how the performance might compare with this: <a href="https:&#x2F;&#x2F;sqlite.org&#x2F;cgi&#x2F;src&#x2F;doc&#x2F;wal2&#x2F;doc&#x2F;wal2.md" rel="nofollow">https:&#x2F;&#x2F;sqlite.org&#x2F;cgi&#x2F;src&#x2F;doc&#x2F;wal2&#x2F;doc&#x2F;wal2.md</a>
avinassh大约 2 年前
I am experimenting with SQLite, where I try inserting 1B rows in under a minute. The current best is inserting 100M rows at 23s. I cut many corners to get performance, but the tweaks might suit your workload.<p>I have explained my rationale and approach here - <a href="https:&#x2F;&#x2F;avi.im&#x2F;blag&#x2F;2021&#x2F;fast-sqlite-inserts&#x2F;" rel="nofollow">https:&#x2F;&#x2F;avi.im&#x2F;blag&#x2F;2021&#x2F;fast-sqlite-inserts&#x2F;</a><p>the repo link - <a href="https:&#x2F;&#x2F;github.com&#x2F;avinassh&#x2F;fast-sqlite3-inserts">https:&#x2F;&#x2F;github.com&#x2F;avinassh&#x2F;fast-sqlite3-inserts</a>
评论 #35551536 未加载
datadeft大约 2 年前
We are in the process of moving from Postgres to SQLite so these suggestions are very useful.
评论 #35552955 未加载
d1l大约 2 年前
The fact that he doesn&#x27;t understand why his wal file is growing without bounds should be a warning to take his suggestions with a grain of salt. This is yet another lazy benchmark with lazy suggestions by someone who is uselessly retreading the path laid down in the SQLite documentation.
yencabulator大约 2 年前
&gt; Normal is still completely corruption safe in WAL mode, and means only WAL checkpoints have to wait for FSYNC.<p>I would consider data loss on crash to be &quot;corruption&quot;, for sure. And synchronous=normal + journal_mode=WAL can lose data:<p><a href="https:&#x2F;&#x2F;www.sqlite.org&#x2F;pragma.html#pragma_synchronous" rel="nofollow">https:&#x2F;&#x2F;www.sqlite.org&#x2F;pragma.html#pragma_synchronous</a><p>&gt; A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.
Scarbutt大约 2 年前
&#x27;Multiple GB&#x27; is ambiguous(is it 20GB or 200GB?) and the article doesn&#x27;t specify the size.
评论 #35551057 未加载
Dave3of5大约 2 年前
Doesn&#x27;t this put the entire dataset into memory no wonder it&#x27;s so fast.
withinboredom大约 2 年前
I&#x27;d be curious for a similar tuning with Dqlite: <a href="https:&#x2F;&#x2F;github.com&#x2F;canonical&#x2F;dqlite">https:&#x2F;&#x2F;github.com&#x2F;canonical&#x2F;dqlite</a>