TechEcho

1 comment

jitl5 months ago

A MUST when building big data sets is to do your inserts BEFORE you create your indexes. It’s much more efficient to do indexing work all at once at the end rather than updating the index tables for each row during each insert.If you’re using a language with threads, you can often get some additional speed-up by writing to multiple databases concurrently, and then merging the tables at the end of your insert process by ATTACHing them to your main DB and doing INSERT INTO main.… SELECT * FROM shardN.…The reason for this is to parallelize the serialization process - turning an object/struct in $LANG into bytes in SQLite format. If your data fits in memory this is almost always worth, if you need to write your shards to disk it depends and you should benchmark it - you’ll have a bad time if you shard too much and saturate your disk I/O budget.If you’re using a language that gives you real pointers to your data like Go/Rust/C, make sure you’re using non-copying options when binding string or blob arguments - as “transient”. SQLite only needs to borrow the bytes until the end of the sqlite3_step(). Avoiding this extra memcopy is FREE performance. Unfortunately many bindings for SQLite only expose the memcopy forms; getting a pathway that avoids this copy often means forking bindings, at least I had to do so with the go bindings.Combining these techniques I can do about 6x faster in CSV -> SQLite in Go compared to the upstream sqlite3 C CLI.

评论 #42621853 未加载

评论 #42587670 未加载

Speeding Up SQLite Inserts

1 comment

Speeding Up SQLite Inserts

1 comment