DuckDB is really having a moment<p>The ecosystem is very active, and they have recently opened up "community extensions" to bring your own functions, data types and connections. A barrier at the moment is that extensions are written in C++, though this limitation should be removed soon.<p>I've been building a lot on top of DuckDB, two of the projects I'm working on are linked in the article:<p>- Evidence (<a href="https://evidence.dev">https://evidence.dev</a>): Build data apps with SQL + Markdown<p>- DuckDB GSheets (<a href="https://duckdb-gsheets.com" rel="nofollow">https://duckdb-gsheets.com</a>): Read/Write Google Sheets via DuckDB
<a href="https://pragprog.com/titles/pwrdata/seven-databases-in-seven-weeks-second-edition/" rel="nofollow">https://pragprog.com/titles/pwrdata/seven-databases-in-seven...</a> - A book by the same name. Instead of giving you a brief blurb on each database, the authors attempt to give you more context and exercises with them. Last updated in 2018 it covers PostgreSQL, HBase, MongoDB, CouchDB, Neo4J, DynamoDB, and Redis. The first edition covered Riak instead of DynamDB.
ClickHouse is awesome, but there's a newer OLAP database in town: Apache Pinot, and it is significantly better: <a href="https://pinot.apache.org/" rel="nofollow">https://pinot.apache.org/</a><p>Here's why it is better:<p>1. User-facing analytics vs. business analytics. Pinot was designed for user-facing analytics (meaning analytics result is used by end-user (for example, "what is the expected delivery time for this restaurant?"). The demands are much higher, including latency, freshness, concurrency and uptime.<p>2. Better architecture. To scale out ClickHouse uses sharding. Which means if you want to add a node you have to bring down the database, re-partition the database and reload the data, then bring it back up. Expect downtime of 1 or 2 days at least. Pinot on the other hand uses segments, which is smaller (but self-contained) pieces of data, and there are lots of segments on each node. When you add a node, Pinot just moves around segments, no downtime needed. Furthermore, for high availability ClickHouse uses replicas. Each shard needs 1 or 2 replicas for HA. Pinot does not have shards vs replica nodes. Instead each segment is replicated to 2 to 3 nodes. This is better for hardware utilization.<p>3. Pre-aggregation. OLAP cubes became popular in the 1990s. They pre-aggregate data to make queries significantly faster, but the downside is high storage cost. ClickHouse doesn't have the equivalent of OLAP cubes at all. Pinot has something better than OLAP cubes: Star trees. Like cubes, star trees pre-aggregate data along multiple dimensions, but don't need as much storage.
Author here.<p>Thanks for sharing! My choices are pretty coloured by personal experience, and I didn't want to re-tread anything from the book (Redis/Valkey, Neo4j etc) other than Postgres - mostly due to Postgres changing _a lot_ over the years.<p>I had considered an OSS Dynamo-like (Cassandra, ScyllaDB, kinda), or a Calvin-like (FaunaDB), but went with FoundationDB instead because to me, that was much more interesting.<p>After a decade of running DBaaS at massive scale, I'm also pretty biased towards easy-to-run.
> If I had to only pick two databases to deal with, I’d be quite happy with just Postgres and ClickHouse - the former for OLTP, the latter for OLAP.<p>As the author mentioned, I completely agree with this statement. In fact, many companies like Cloudflare are built with exactly this approach and it has scaled them pretty well without the need of any third database.<p>> Another reason I suggest checking out ClickHouse is that it is a joy to operate - deployment, scaling, backups and so on are well documented - even down to setting the right CPU governor is covered.<p>Another point mentioned by author which is worth highlighting is the ease of deployment. Most distributed databases aren't so easy to run at scale, ClickHouse is much much easier and it has become even more easier with efficient storage-compute separation.
The article mentions the TigerBeetle Style Guide: <a href="https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TIGER_STYLE.md">https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TI...</a><p>I agree so much with the paragraphs about "Dependencies" and "Tooling".
I didn't realize this [1] was a thing. I've been informally referring to our Postgres/Elixir stack as "boring, but in the best way possible, it just works with no drama whatsoever" for years.<p>1: <a href="https://boringtechnology.club" rel="nofollow">https://boringtechnology.club</a>
DuckDB really seems to be having its moment—projects like Evidence and DuckDB GSheets are super cool examples of its potential. And yeah, Postgres’s longevity is insane, it just keeps adapting.<p>On the AI front, vector databases like Pinecone and pgvector are exciting, but I’d love to see something even more integrated with AI workflows. The possibilities are huge. Curious to hear what others think!
Ever since CockroachDB changed their license, I'm searching for alternatives. PostgreSQL is an obvious choice but is there a good HA solution? What people usually do for HA with PostgreSQL or do they just not care about it? I tested Patroni, which is the most popular one in my knowledge, but found some HA issues that makes me hesitate to use: <a href="https://www.binwang.me/2024-12-02-PostgreSQL-High-Availability-Solutions-Part-1.html" rel="nofollow">https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...</a>
For those not familiar with DuckDB, it's an amazing database, but it is not a replacement for SQLite, if you are looking for a lightweight server side DB. I'm in love with the DuckDB client and use it to query SQLite databases, but due the fact that it only supports one concurrent write connection, it is not suitable as a server side DB.
I'm just gonna say it: unless I had a specific reason to use it, I would cross CockroachDB off my list purely based on the name. I don't want to be thinking of cockroaches every time I use my database. Names do have meaning, and I have to wonder why they went with that one.