SQLite or PostgreSQL? It's Complicated

297 pointsby makaimcalmost 3 years ago

36 comments

msbarnettalmost 3 years ago

> Once again, SQLite happily stored my UUIDs on this column and even indexed them, but to get these across to PostgreSQL I needed to fix the schema and properly declare these keys as strings.In PostgreSQL you're leaving performance on the table if you store UUIDs as strings instead of as the built in UUID type (128-bit value under the hood)

评论 #31909931 未加载

评论 #31909320 未加载

评论 #31909442 未加载

评论 #31914683 未加载

jeffdnalmost 3 years ago

> The solution was to grow the slug column to 512 characters and retry.No, the solution is to always use `text` as your column type, and use a check constraint if you need to enforce a limit. It's much easier and safer to alter a check constraint than it is to change a column type live in production. The `text` type and `varchar(x)` are identical under the hood, `text` takes up no more space on disk.

评论 #31910314 未加载

评论 #31910042 未加载

评论 #31909440 未加载

bob1029almost 3 years ago

I think it's even more complicated.The reduction in latency brought on by in-process databases, combined with modern NVMe storage, means that SQLite is a substantially faster approach than any other solution which requires a trip through the network stack.I've got services in production for several financial institutions right now that are resolving most SQL queries within 100-1000 micro seconds by simply using SQLite on reasonable hardware.How many more users could you support if the amount of time you had to await IO for each was reduced by ~2 orders of magnitude?Obvious caveats here being the resilience angle. We addressed this with application-level protocols and additional instances.

评论 #31912997 未加载

评论 #31915183 未加载

评论 #31917588 未加载

评论 #31912059 未加载

评论 #31916553 未加载

评论 #31924005 未加载

philliphaydonalmost 3 years ago

It’s difficult to accept the results without looking at the query plans to see how the queries are running and if they are optimal. Seems like it’s just a straight dump of data into PostgreSQL and letting SQLAlchemy handle some queries and indexes but no analysis of the indexes at all.Could be able to squeeze more perf out of both SQLite and PostgreSQL.

评论 #31909338 未加载

评论 #31909226 未加载

ed25519FUUUalmost 3 years ago

The fact that the performance is so close between a database that requires NO SERVER and one that requires a dedicated server + ops management is a signal to me that all projects should start on SQLite unless there is a compelling reason not to.

评论 #31909269 未加载

评论 #31909103 未加载

评论 #31909360 未加载

评论 #31909611 未加载

评论 #31909185 未加载

评论 #31909472 未加载

评论 #31917416 未加载

评论 #31909873 未加载

zinekelleralmost 3 years ago

I have a question that wasn't answered on the blog, and might be silly for me to ask. Is SQLite running in WAL mode or not? I usually found that WAL mode usually allows better performance even in read-heavy workloads as it doesn't need to exclusively lock the database each time.Otherwise, the post is a nice reminder for SQLite users that foresee a database switch to use the newly-introduced strict mode.

评论 #31910072 未加载

评论 #31909778 未加载

mjw1007almost 3 years ago

I had a look at SQLite around the time sqlite3 appeared, and my notes say I decided I didn't like it because of the lax and sometimes inconsistent way it can treat its input.I thought I'd have another go at it today (sqlite 3.34) and I'm mildly surprised that the things I didn't like are still there:<pre><code> sqlite> select '1 apple' + '2 oranges'; 3 sqlite> select count(*) where 0.9; 1 sqlite> select count(*) where 1.1; 1 sqlite> select count(*) where not 0.9; 1 sqlite> select count(*) where not 1.1; 0</code></pre>

评论 #31910666 未加载

评论 #31910404 未加载

评论 #31910096 未加载

ormaajalmost 3 years ago

This is totally uncomplicated. The choice of database is a function of your application and its architecture, not a matter of all-things-equal performance comparison."I want an extensible versitile database platform for use as a component in an infrastructure that is part of a larger system" - postgres."I want an embeddable lightweight SQL storage backend for a narrow scope, single-purpose, probably single-user, probably single-system application" - sqlitePostgres should perform well in any role in the hands of an expert - it's a question of whether or not that is overkill. Sqlite can also perform well in a subset of those roles.

mattxxxalmost 3 years ago

Some context: I've worked with massive PostgreSQL db's, containing millions of daily records spanning multiple years (allowing for potentially massive numbers of returned rows). I also exclusively MVP simple db architectures in SQLite (think: no massive joins or need to partition data).Best takeaways here:- SQLite kills it. It appears untuned PostgreSQL is only 2 seconds (???) faster than untuned SQLite; always try to kickoff a project with SQLite- PostgreSQL has a performance cliff when query responses leave memory. This is extra true when an index no longer fits in memory. worker_mem is critical for pushing back this performance cliffSide note: Tuning postgres can be really painful, but it's also a really dope tried-and-true db.

xscalmost 3 years ago

It seems a solid first step would be to use EXPLAIN to understand query planning/scanning issues.Assuming 6.5M rows/day*1.5yr=~3.5B rows, how large is the dataset in both SQLite and pg? With indexes, is it fitting in RAM?Linode 2GB plan is shared vcpu, so results will vary depending on who is sharing it with your service.

评论 #31909531 未加载

mcdonjealmost 3 years ago

>If you are going to take one thing away from this article, I hope it is that the only benchmarks that are valuable are those that run on your own platform, with your own stack, with your own data, and with your own software. And even then, you may need to add custom optimizations to get the best performance.Good advice, contextually.

ehutch79almost 3 years ago

Is it just me or was testing done with only four concurrent clients? In a mostly read only environment?Wouldn't moving to a static site generator been a better direction in that case?If you're maxing out at four people reading your blog, I can't imagine more infrastructure is better.

评论 #31909280 未加载

KronisLValmost 3 years ago

You know, apart from some of the details in regards to what data types to better use etc., I think that both PostgreSQL and SQLite are excellent choices for most use cases!Of course, SQLite might be the best choice for embedded or local usage, they even have this lovely page about using it as an application file format: <a href="https://sqlite.org/appfileformat.html" rel="nofollow">https://sqlite.org/appfileformat.html</a>And PostgreSQL might scale better for multiple writers and situations where you need it on a separate VM/server/container that might be shared between different apps through a network connection of some sort.Though it's also curious, because you never hear a lot about MySQL or MariaDB (comparatively, in my subjective opinion) on HN or many other places for that matter. And yet, it's still a decent option, has good tooling (MySQL Workbench is great), is widely supported and backs a significant amount of production systems out there.Do people just view it as old/legacy, or is there the belief that for many use cases PostgreSQL might simply be the better option, e.g. also because of projects like PostGIS and support for more complex setups, like custom data types, more advanced scripting language for in-database processing and whatnot?

评论 #31912977 未加载

评论 #31914630 未加载

mikecealmost 3 years ago

I really wish WordPress supported using SQLite as it's database instead of MySQL as this could make backups as simple as rsync'ing from prod server to backups.

评论 #31912239 未加载

评论 #31911509 未加载

评论 #31909673 未加载

评论 #31911112 未加载

评论 #31911181 未加载

评论 #31910127 未加载

polyrandalmost 3 years ago

Great blog post. A few observations (considering the blog post skips numerous details):SQLite really shines when compared to running a database in a separate server from the app. This is when the network latency starts being noticeable. Latency between you and the app is usually constant. The latency between the app and the database is what matters. If in this case, Postgres can run in the same host as the main app, I think it's a good thing to do. But very often that's not the case.Even though not surprising, it's great to see how SQLite handles concurrency. From my experience, SQLite would be able to handle much more read concurrency. It would have been great to see more benchmarks (like 1000 concurrent users, even if the app would never see such traffic).I was surprised that SQLite didn't beat Postgres on simpler queries. Without seeing the queries, if they contain multiple joins, that could be another reason why SQLite is slower since it can only do "nested loop" JOINs. The workloads also seem mostly analytical, we'll see where DuckDB goes in this space (looking promising for now).One thing not mentioned in the blog post is "easy to maintain and update". If the app is deployed with Docker, adding Postgres means you go from worrying about:<pre><code> 1 docker container + 1 docker volume </code></pre> to:<pre><code> 2 docker containers (docker-compose?) + 1 docker volume + internal docker networking </code></pre> The fact that SQLite is so simple to operate also has some advantages. You can usually go back to an SQLite-based app 1 year later and understand everything. With other databases, you may need to review users, roles, permissions, connection string, etc.Another cool thing about SQLite is that it scales mostly linearly as you add more CPU cores. It can scale as long as you can also scale the number of processes in your app. In other databases, scaling means adding connection pools and other tools in the middle.Lastly, and related to the latency topic. The low latency in SQLite means that you can sometimes convert a big query into 5 separate, small and reusable queries without any performance hit. It's really remarkable home much cleaner your code can become with this.In any case, I love working both with Postgres and SQLite, and we need more blog posts like this one showing real-world cases.

评论 #31916593 未加载

rglullisalmost 3 years ago

Here is a comparison that I'd like to see next: how much time was spent on the following...- porting the application, making all these benchmarks and writing a blog post<pre><code> vs </code></pre> - simply getting a VM with more RAM and more CPUs.In the conclusion the author goes on to talk about the options "if I get the budget" vs "without throwing more money at the problem". I'd guess that the cost of the day of a Principal Engineer would far surpass the cost of years running a 4CPU server.

yreadalmost 3 years ago

I understand and partially agree with all the criticism of the article (and could add my own like why did you tweak work_mem out of all the hundreds of settings on postgres) but it's a blog post, not a scientific article - I still think it's a useful post. I might be in a similar situation at some point and it's nice to know that you haven't encountered any major gotchas and made it even 2x faster. Thanks for sharing your experience.

rammy1234almost 3 years ago

If you say using SQLite, there is always risk of losing that data file in disk right. How is that managed. is this a dumb question , which PostgreSQL do not ??

评论 #31910478 未加载

评论 #31910182 未加载

评论 #31910099 未加载

评论 #31909666 未加载

评论 #31913727 未加载

xwdvalmost 3 years ago

No one ever got fired for just using postgtreSQL, maybe burned though.

cocoflunchyalmost 3 years ago

According to <a href="https://www.linode.com/pricing/" rel="nofollow">https://www.linode.com/pricing/</a> a shared instance with 2GB of RAM and 1CPU is $10/month. I don't know how much a principal engineer at twilio makes but I'm pretty sure this wasn't a cost effective optimization... It's also a bit weird not to talk about indexing? Are sqlite and Postgresql similar enough that the same indexing strategy should be used?

Thaxllalmost 3 years ago

I mean SQLite is living on a single node, what exactly do you do when you have failure, when more than one process need to acces the DB, when you need RBAC etc ...

评论 #31910461 未加载

评论 #31909971 未加载

评论 #31910094 未加载

评论 #31911703 未加载

neverartfulalmost 3 years ago

Interesting article. It should be noted that the testing done is with 100% read/query workload (not surprising since the application is a website of blogs).

评论 #31909125 未加载

stormdennisalmost 3 years ago

I don't know about now but at one time Firebird made this question moot. It never received the love it deserved in my view.

评论 #31910546 未加载

danaugrsalmost 3 years ago

How about PostgreSQLite? Joke aside, is anyone building something that takes a stab at fixing the issues with both?

rhabarbaalmost 3 years ago

Honestly, the blogging dude wrote a web "application" in Python, known for its not-quite-good performance, and wonders whether he can make the application faster by... changing the database?

adityapatadiaalmost 3 years ago

Am I the only one who is thinking why they have developed in-house system for such tracking in first place? This problem (of analysis of traffic) could easily be achieved with Segment (a twillio product) and analytics tool like Mixpanel or Amplitude.

umvialmost 3 years ago

Can SQLite be used outside of single threaded contexts? Last I checked having multiple concurrent readers and writers is very difficult. Multiple concurrent readers only is probably ok if that's all you need.

评论 #31910450 未加载

评论 #31917393 未加载

评论 #31910394 未加载

Labo333almost 3 years ago

The major drawback I encountered when building upon SQLite is concurrency of writes.I have a lot of worker processes writing new data.With SQLite, I was getting a lot of "Database locked" errors and ended up having to use pg.

andy_pppalmost 3 years ago

Why is this limited to four clients? Maybe I missed something but it would be good to see what happens with higher numbers of connections (something both of these dbs are rumoured to be bad at).

nojvekalmost 3 years ago

It seems the queries are aggregate based and they’d be served better by DuckDB instead of SQLite.DuckDB performance is astounding for doing group by, sum/count/avg with filters and sorts.

KingOfCodersalmost 3 years ago

From my experience SQLLite shines (using it in Go) when in-process, local, SQLlite-API and from an NVMe. It's probably not the right choice over networks and with ORMs.

cryptonectoralmost 3 years ago

My advice: go straight to PG. And avoid ORMs.

评论 #31909196 未加载

评论 #31910993 未加载

评论 #31909194 未加载

评论 #31911200 未加载

评论 #31909427 未加载

评论 #31909453 未加载

vivzkestrelalmost 3 years ago

You are comparing the wrong databases together! Compare MySQL and PostgreSQL instead

mritchie712almost 3 years ago

Do clickhouse next!

johnthescottalmost 3 years ago

is the app used by a single user or shared across many users? what question could be simpler?

RedShift1almost 3 years ago

Use the right tool for the right job.

评论 #31909883 未加载