TechEcho

19 comments

FWIW, Marginalia Search has had nearly a billion rows in a table. It runs on a single PC with 128 Gb RAM.It's survived several HN death hugs.Though I would say there are things you can do with a 20M table that you can't with a 1B table. If the query doesn't have an index, it will take hours. Even SELECT COUNT(*) takes like 10 min.

评论 #36040453 未加载

evaneliasabout 2 years ago

In my direct experience, with small rows and a carefully optimized workload and schema, 1 to 2 billion rows was totally fine in a single MySQL 5.1 table (InnoDB) on SSDs twelve years ago. Granted, that was on bare metal. But hardware has also improved a fair bit since then...

totalhackabout 2 years ago

Haven't had trouble into the low hundreds of millions on a single RDS server with 16GB of ram. YMMV.

评论 #36039651 未加载

评论 #36038998 未加载

评论 #36039336 未加载

grumpleabout 2 years ago

You can safely go to a billion rows. Use indexes. Don't do silly things like count(*) or select *, select the small data sets you need using your indexes. If you need a full table count for some reason, use a primary key that auto increments and select the max, or use information_schema for an estimate. Nobody will sweat the slight inaccuracy on a table that size.

评论 #36038674 未加载

评论 #36038644 未加载

makmanalpabout 2 years ago

Just off the top of my head I can think of a few cases where we easily have 100-1000x that on a single table in shared node on kube, with an ebs volume (point being, bog standard, if a tad large, commodity hardware). If you properly index things and your workload isn't too crazy in a way that needs access to all of this at once (e.g. point lookups with some recency bias), classic RDBMSs have this use case covered - and it's achievable by anyone!

pollux1997about 2 years ago

Thanks for repost the blog! Some crazy guys have already told me that they have mysql table >1.4B rows (~180GiB), underestimated mysql's power lol

评论 #36038804 未加载

评论 #36039300 未加载

pbreitabout 2 years ago

Isn't 20m rows super tiny?

评论 #36038948 未加载

whalesaladabout 2 years ago

very cool repo from the author full of all kinds of tests and scripts: <a href="https://github.com/gongyisheng/playground/blob/main/mysql/row_test/select_test.py">https://github.com/gongyisheng/playground/blob/main/mysql/ro...</a>might need to fork/run this on psql to compare performance.

andrewmcwattersabout 2 years ago

I love real meat articles like this where software engineers actually explain why particular limits exist, or why technical things are they way they are. Good read. Thanks for sharing.

forgetfreemanabout 2 years ago

When was this ever a thing?

评论 #36039439 未加载

ericbarrettabout 2 years ago

Love the article, but one criticism: I've found gp3 to be better than gp2 in all aspects, especially consistency—and it's cheaper. For a long time you could only use RDS with gp2 but I know of no reason to choose it in 2023.

davgoldinabout 2 years ago

We've got MySQL 8 running on high-end-ish consumer hardware, with 150 tables ranging from thousands to billions of rows. Largest table is a bit over 5 billion rows and 520GiB. Parallel network backup takes about 3 hours.

评论 #36094981 未加载

tbrownawabout 2 years ago

...yes, performance will drop when your set of active data exceeds available memory and your system needs to reference slower storage. I assume this would apply to anything.

评论 #36038882 未加载

bagelsabout 2 years ago

You should be able to handle a billion for the right (simple) table and hardware. It really depends on the data and queries.

jveabout 2 years ago

I see no problems of benchmark apart from you'd be avoiding table scans in prod for tables of those sizes - that kind of benchmark kind of pointless. Well, it shows a point, but not something you would want to use anyway.Once data is cached, using indexed lookups are fast, 0.5ms.

encodererabout 2 years ago

We get sub-ms response time on a table with over 250 million rows. We have a covering index for all high-volume queries, running on rds (aurora) with 16gb ram.

评论 #36045693 未加载

jerryjerryjerryabout 2 years ago

In general, NewSQL (distributed database) may still be a good solution to address scalability limit encountered by a single server/node database like MySQL/PG. Currently, there are quite a few options in this domain: Spanner, PG-compatible (CockroachDB, YugaByte), MySQL-compatible (TiDB, MariaDB xPand), and other middleware products to handle sharding.

AtlasBarfedabout 2 years ago

B+ trees scale vertically very well, until they don't.

Sparkyteabout 2 years ago

Hardware has limits as well as software. You would need a completely different architecture to accomodate a 20M+ database. Usually schemeless the database overhead is typically what is responsible how many rows it handle and because your client interacts with the MySQL engine it is at limit and performance of that engine. So yeeeee-no. Time to use a different database.

评论 #36040028 未加载

评论 #36038936 未加载

19 comments

marginalia_nuabout 2 years ago

评论 #36040453 未加载

evaneliasabout 2 years ago

totalhackabout 2 years ago

Haven't had trouble into the low hundreds of millions on a single RDS server with 16GB of ram. YMMV.

评论 #36039651 未加载

评论 #36038998 未加载

评论 #36039336 未加载

grumpleabout 2 years ago

评论 #36038674 未加载

评论 #36038644 未加载

makmanalpabout 2 years ago

pollux1997about 2 years ago

Thanks for repost the blog! Some crazy guys have already told me that they have mysql table >1.4B rows (~180GiB), underestimated mysql's power lol

评论 #36038804 未加载

评论 #36039300 未加载

pbreitabout 2 years ago

Isn't 20m rows super tiny?

评论 #36038948 未加载

whalesaladabout 2 years ago

andrewmcwattersabout 2 years ago

I love real meat articles like this where software engineers actually explain why particular limits exist, or why technical things are they way they are. Good read. Thanks for sharing.

forgetfreemanabout 2 years ago

When was this ever a thing?

评论 #36039439 未加载

ericbarrettabout 2 years ago

davgoldinabout 2 years ago

评论 #36094981 未加载

tbrownawabout 2 years ago

...yes, performance will drop when your set of active data exceeds available memory and your system needs to reference slower storage. I assume this would apply to anything.

评论 #36038882 未加载

bagelsabout 2 years ago

You should be able to handle a billion for the right (simple) table and hardware. It really depends on the data and queries.

jveabout 2 years ago

encodererabout 2 years ago

We get sub-ms response time on a table with over 250 million rows. We have a covering index for all high-volume queries, running on rds (aurora) with 16gb ram.

Is 20M of rows still a valid soft limit of MySQL table in 2023?

19 comments

Is 20M of rows still a valid soft limit of MySQL table in 2023?

19 comments