> Once again, SQLite happily stored my UUIDs on this column and even indexed them, but to get these across to PostgreSQL I needed to fix the schema and properly declare these keys as strings.<p>In PostgreSQL you're leaving performance on the table if you store UUIDs as strings instead of as the built in UUID type (128-bit value under the hood)
> The solution was to grow the slug column to 512 characters and retry.<p>No, the solution is to always use `text` as your column type, and use a check constraint if you need to enforce a limit. It's much easier and safer to alter a check constraint than it is to change a column type live in production. The `text` type and `varchar(x)` are identical under the hood, `text` takes up no more space on disk.
I think it's even more complicated.<p>The reduction in latency brought on by in-process databases, combined with modern NVMe storage, means that SQLite is a substantially faster approach than any other solution which requires a trip through the network stack.<p>I've got services in production for several financial institutions right now that are resolving <i>most</i> SQL queries within 100-1000 <i>micro</i> seconds by simply using SQLite on reasonable hardware.<p>How many more users could you support if the amount of time you had to await IO for each was reduced by ~2 orders of magnitude?<p>Obvious caveats here being the resilience angle. We addressed this with application-level protocols and additional instances.
It’s difficult to accept the results without looking at the query plans to see how the queries are running and if they are optimal. Seems like it’s just a straight dump of data into PostgreSQL and letting SQLAlchemy handle some queries and indexes but no analysis of the indexes at all.<p>Could be able to squeeze more perf out of both SQLite and PostgreSQL.
The fact that the performance is so close between a database that requires NO SERVER and one that requires a dedicated server + ops management is a signal to me that all projects should start on SQLite unless there is a compelling reason not to.
I have a question that wasn't answered on the blog, and might be silly for me to ask. Is SQLite running in WAL mode or not? I usually found that WAL mode usually allows better performance even in read-heavy workloads as it doesn't need to exclusively lock the database each time.<p>Otherwise, the post is a nice reminder for SQLite users that foresee a database switch to use the newly-introduced strict mode.
I had a look at SQLite around the time sqlite3 appeared, and my notes say I decided I didn't like it because of the lax and sometimes inconsistent way it can treat its input.<p>I thought I'd have another go at it today (sqlite 3.34) and I'm mildly surprised that the things I didn't like are still there:<p><pre><code> sqlite> select '1 apple' + '2 oranges';
3
sqlite> select count(*) where 0.9;
1
sqlite> select count(*) where 1.1;
1
sqlite> select count(*) where not 0.9;
1
sqlite> select count(*) where not 1.1;
0</code></pre>
This is totally uncomplicated. The choice of database is a function of your application and its architecture, not a matter of all-things-equal performance comparison.<p>"I want an extensible versitile database platform for use as a component in an infrastructure that is part of a larger system" - postgres.<p>"I want an embeddable lightweight SQL storage backend for a narrow scope, single-purpose, probably single-user, probably single-system application" - sqlite<p>Postgres should perform well in any role in the hands of an expert - it's a question of whether or not that is overkill. Sqlite can also perform well in a subset of those roles.
Some context: I've worked with massive PostgreSQL db's, containing millions of daily records spanning multiple years (allowing for potentially massive numbers of returned rows). I also exclusively MVP simple db architectures in SQLite (think: no massive joins or need to partition data).<p>Best takeaways here:<p>- SQLite kills it. It appears untuned PostgreSQL is only 2 seconds (???) faster than untuned SQLite; always try to kickoff a project with SQLite<p>- PostgreSQL has a performance cliff when query responses leave memory. This is extra true when an index no longer fits in memory. worker_mem is critical for pushing back this performance cliff<p>Side note: Tuning postgres can be really painful, but it's also a really dope tried-and-true db.
It seems a solid first step would be to use EXPLAIN to understand query planning/scanning issues.<p>Assuming 6.5M rows/day*1.5yr=~3.5B rows, how large is the dataset in both SQLite and pg? With indexes, is it fitting in RAM?<p>Linode 2GB plan is shared vcpu, so results will vary depending on who is sharing it with your service.
>If you are going to take one thing away from this article, I hope it is that the only benchmarks that are valuable are those that run on your own platform, with your own stack, with your own data, and with your own software. And even then, you may need to add custom optimizations to get the best performance.<p>Good advice, contextually.
Is it just me or was testing done with only four concurrent clients? In a mostly read only environment?<p>Wouldn't moving to a static site generator been a better direction in that case?<p>If you're maxing out at four people reading your blog, I can't imagine more infrastructure is better.
You know, apart from some of the details in regards to what data types to better use etc., I think that both PostgreSQL and SQLite are excellent choices for most use cases!<p>Of course, SQLite might be the best choice for embedded or local usage, they even have this lovely page about using it as an application file format: <a href="https://sqlite.org/appfileformat.html" rel="nofollow">https://sqlite.org/appfileformat.html</a><p>And PostgreSQL might scale better for multiple writers and situations where you need it on a separate VM/server/container that might be shared between different apps through a network connection of some sort.<p>Though it's also curious, because you never hear a lot about MySQL or MariaDB (comparatively, in my subjective opinion) on HN or many other places for that matter. And yet, it's still a decent option, has good tooling (MySQL Workbench is great), is widely supported and backs a significant amount of production systems out there.<p>Do people just view it as old/legacy, or is there the belief that for many use cases PostgreSQL might simply be the better option, e.g. also because of projects like PostGIS and support for more complex setups, like custom data types, more advanced scripting language for in-database processing and whatnot?
I really wish WordPress supported using SQLite as it's database instead of MySQL as this could make backups as simple as rsync'ing from prod server to backups.
Great blog post. A few observations (considering the blog post skips numerous details):<p>SQLite really shines when compared to running a database in a separate server from the app. This is when the network latency starts being noticeable. Latency between you and the app is usually constant. The latency between the app and the database is what matters. If in this case, Postgres can run in the same host as the main app, I think it's a good thing to do. But very often that's not the case.<p>Even though not surprising, it's great to see how SQLite handles concurrency. From my experience, SQLite would be able to handle much more read concurrency. It would have been great to see more benchmarks (like 1000 concurrent users, even if the app would never see such traffic).<p>I was surprised that SQLite didn't beat Postgres on simpler queries. Without seeing the queries, if they contain multiple joins, that could be another reason why SQLite is slower since it can only do "nested loop" JOINs. The workloads also seem mostly analytical, we'll see where DuckDB goes in this space (looking promising for now).<p>One thing not mentioned in the blog post is "easy to maintain and update". If the app is deployed with Docker, adding Postgres means you go from worrying about:<p><pre><code> 1 docker container + 1 docker volume
</code></pre>
to:<p><pre><code> 2 docker containers (docker-compose?) + 1 docker volume + internal docker networking
</code></pre>
The fact that SQLite is so simple to operate also has some advantages. You can usually go back to an SQLite-based app 1 year later and understand everything. With other databases, you may need to review users, roles, permissions, connection string, etc.<p>Another cool thing about SQLite is that it scales mostly linearly as you add more CPU cores. It can scale as long as you can also scale the number of processes in your app. In other databases, scaling means adding connection pools and other tools in the middle.<p>Lastly, and related to the latency topic. The low latency in SQLite means that you can sometimes convert a big query into 5 separate, small and reusable queries without any performance hit. It's really remarkable home much cleaner your code can become with this.<p>In any case, I love working both with Postgres and SQLite, and we need more blog posts like this one showing real-world cases.
Here is a comparison that I'd like to see next: how much time was spent on the following...<p>- porting the application, making all these benchmarks and writing a blog post<p><pre><code> vs
</code></pre>
- simply getting a VM with more RAM and more CPUs.<p>In the conclusion the author goes on to talk about the options "if I get the budget" vs "without throwing more money at the problem". I'd guess that the cost of the <i>day</i> of a Principal Engineer would far surpass the cost of <i>years</i> running a 4CPU server.
I understand and partially agree with all the criticism of the article (and could add my own like why did you tweak work_mem out of all the hundreds of settings on postgres) but it's a blog post, not a scientific article - I still think it's a useful post. I might be in a similar situation at some point and it's nice to know that you haven't encountered any major gotchas and made it even 2x faster. Thanks for sharing your experience.
If you say using SQLite, there is always risk of losing that data file in disk right. How is that managed. <i>is this a dumb question</i> , which PostgreSQL do not ??
According to <a href="https://www.linode.com/pricing/" rel="nofollow">https://www.linode.com/pricing/</a> a shared instance with 2GB of RAM and 1CPU is $10/month. I don't know how much a principal engineer at twilio makes but I'm pretty sure this wasn't a cost effective optimization...
It's also a bit weird not to talk about indexing? Are sqlite and Postgresql similar enough that the same indexing strategy should be used?
I mean SQLite is living on a single node, what exactly do you do when you have failure, when more than one process need to acces the DB, when you need RBAC etc ...
Interesting article. It should be noted that the testing done is with 100% read/query workload (not surprising since the application is a website of blogs).
Honestly, the blogging dude wrote a web "application" in Python, known for its not-quite-good performance, and wonders whether he can make the application faster by... changing the database?
Am I the only one who is thinking why they have developed in-house system for such tracking in first place? This problem (of analysis of traffic) could easily be achieved with Segment (a twillio product) and analytics tool like Mixpanel or Amplitude.
Can SQLite be used outside of single threaded contexts? Last I checked having multiple concurrent readers and writers is very difficult. Multiple concurrent readers only is probably ok if that's all you need.
The major drawback I encountered when building upon SQLite is concurrency of writes.<p>I have a lot of worker processes writing new data.<p>With SQLite, I was getting a lot of "Database locked" errors and ended up having to use pg.
Why is this limited to four clients? Maybe I missed something but it would be good to see what happens with higher numbers of connections (something both of these dbs are rumoured to be bad at).
It seems the queries are aggregate based and they’d be served better by DuckDB instead of SQLite.<p>DuckDB performance is astounding for doing group by, sum/count/avg with filters and sorts.
From my experience SQLLite shines (using it in Go) when in-process, local, SQLlite-API and from an NVMe. It's probably not the right choice over networks and with ORMs.