The pdf: <a href="https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf" rel="nofollow">https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf</a>
I shared some notes on this on my blog, because I'm guessing a lot of people aren't quite invested enough to read through the whole paper: <a href="https://simonwillison.net/2022/Sep/1/sqlite-duckdb-paper/" rel="nofollow">https://simonwillison.net/2022/Sep/1/sqlite-duckdb-paper/</a>
Regarding hash joins, the SQLite documentation mentions the absence of real hash tables [0]<p><pre><code> SQLite constructs a transient index instead of a hash table in this instance
because it already has a robust and high performance B-Tree implementation at
hand, whereas a hash-table would need to be added. Adding a separate hash table
implementation to handle this one case would increase the size of the library
(which is designed for use on low-memory embedded devices) for minimal
performance gain.
</code></pre>
It's already linked in the paper, but here's the link to the code used in the paper [1]<p>The paper mentions implementing Bloom filters for analytical queries an explains how they're used. I wonder if this is related to the query planner enhancements that landed on SQLite 3.38.0 [2]<p><pre><code> Use a Bloom filter to speed up large analytic queries.
</code></pre>
[0]: <a href="https://www.sqlite.org/optoverview.html#hash_joins" rel="nofollow">https://www.sqlite.org/optoverview.html#hash_joins</a><p>[1]: <a href="https://github.com/UWHustle/sqlite-past-present-future" rel="nofollow">https://github.com/UWHustle/sqlite-past-present-future</a><p>[2]: <a href="https://www.sqlite.org/releaselog/3_38_0.html" rel="nofollow">https://www.sqlite.org/releaselog/3_38_0.html</a>
TFA appears to be about adapting SQLite for OLAP workloads. I do not understand the rationale. Why try to adapt a row-based storage system for OLAP? Why not just use a column store?
I've been learning SQL recently with PostgreSQL and MySQL in an online bootcamp here in Spain. So far very comprehensive. We've touched indexing and partitioning with EXPLAIN ANALYZE for optimizing performance, and I've implemented this strategies successfully onto an active forum I own.<p>The SQL course has almost no love by the students but so far it has been the most useful and interesting to me.<p>I was able to create some complex views (couldn't understand how to make materialized views in MySQL), but they were still very slow.<p>I decided to copy most of this forum DB to DuckDB (with Knime now, until I know better), and optimization with DuckDB seems pointless. It's very, very fast. Less energy usage for my brain, and less time waiting. That's a win for me.<p>My current dataset is about 40GB, so It's not HUGE, and sure people here in HN would laugh at my "complex" views, but so far I've reduced all my concerns from optimizing to how to download the data I need without causing problems to the server.
I'm confused that why in Figure3, seems in Raspberry Pi, latency is slower than same queries' latency in cloud server. Did I missed something?
After seeing one diagram: Why the hack are we still talking to databases with SQL strings and not directly specifying the Query-AST? Admins, sure (a fancy UI could help there as well) but why in our code?
my ipad won’t let me search through the PDF, but i couldn’t find where “SSB” was defined, if anywhere. i did not see it defined in the first paragraph, which is where it is first used.<p>everyone: not all of your readers are domain experts. omissions like this are infuriating.
"While it continues to be the most widely used database engine in the world"<p>It realy depends what do you mean by that, yes it's shipping in every phones and browser, but I don't consider that as a database. Is the windows registry a database?<p>Oracle, MySQL, PG, MSSQL are the most widly used DB in the world, the web runs on those not SQLite.
>SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format.<p>I found that claim to be fairly surprising, SQLite is pretty bad when it comes to transactions per second. SQLite even owns up to it in the FAQ:<p>>it will only do a few dozen transactions per second.