Migrating to Postgres

240 pointsby shenli35144 days ago

24 comments

luhn4 days ago

> By Jan 2024, our largest table had roughly 100 million rows.I did a double take at this. At the onset of the article, the fact they're using a distributed database and the mention of a "mid 6 figure" DB bill made me assume they have some obscenely large database that's far beyond what a single node could do. They don't detail the Postgres setup that replaced it, so I assume it's a pretty standard single primary and a 100 million row table is well within the abilities of that—I have a 150 million row table happily plugging along on a 2vCPU+16GB instance. Apples and oranges, perhaps, but people shouldn't underestimate what a single modern server can do.

评论 #43991154 未加载

评论 #43993422 未加载

评论 #43990995 未加载

评论 #43991223 未加载

评论 #43994644 未加载

评论 #43990959 未加载

评论 #43993181 未加载

评论 #43992324 未加载

评论 #43993637 未加载

评论 #43996726 未加载

评论 #43993495 未加载

评论 #43998043 未加载

评论 #43990917 未加载

评论 #43997689 未加载

esafak4 days ago

I read it as: Why You Shouldn't Use Prisma and How Cockroach Hung Us Out To DryI already knew about prisma from the infamous <a href="https://github.com/prisma/prisma/discussions/19748">https://github.com/prisma/prisma/discussions/19748</a>

评论 #43991706 未加载

评论 #43993580 未加载

评论 #43990682 未加载

评论 #43990564 未加载

评论 #43990764 未加载

评论 #43990604 未加载

评论 #43990909 未加载

sreekanth8504 days ago

It's wild and hilarious, how often startups and companies go for distributed databases like CockroachDB/TiDB/Yugabyte before they actually need distribution, this trends sucks. 100 million rows is nothing that a well-tuned Postgres or MySQL instance (or even read-replicated setup) can't handle comfortably. Scale when you hit the wall.

评论 #43991262 未加载

评论 #43993851 未加载

eftpotrm3 days ago

I can't help thinking more startups need greybeards around. (Of which, realistically, I'm now one.)Largest table 100 million rows and they were paying 6 figures for database services annually? I have one now that sits happily enough on an 8yo laptop. I've worked on systems that had similar scale tables chugging along on very average for 20 years ago MSSQL 2000 boxes. There just isn't a need for cloud scale systems and cloud scale bills for that data volume.The problems they're describing should never have got that far without an experienced hand pointing out they didn't make sense, and if they'd hired that greybeard they'd have spotted it long before.

评论 #43993617 未加载

gilbetron3 days ago

The answer to the question, "what database should I use?" is "postgres". If you are in a situation where postgres actually won't work, then you already would know exactly why postgres won't work.In other words: [Postgres -> exotic solution] is the path everyone should take (and 99% will just stay in postgres), and not [exotic solution -> postgres].

评论 #43995738 未加载

etler4 days ago

I've lost count of how many "Migrating from X to Postgres" articles I've seen.I don't think I've once seen a migrating away from Postgres article.

评论 #43990866 未加载

评论 #43990824 未加载

评论 #43994110 未加载

评论 #43992311 未加载

评论 #43992229 未加载

评论 #43991209 未加载

评论 #43992200 未加载

评论 #43991148 未加载

评论 #43992577 未加载

robertlagrant3 days ago

For all the Prisma-haters: I salute you. But I want to reply to numerous comments with the following:ORMs come in two main types, that I'm aware of: Active Record (named after the original Ruby one, I think) and Data Mapper (think Hibernate; SQLAlchemy).Active Record ORMs are slightly more ergonomic at the cost of doing loads of work in application memory. Data Mapper looks slightly more like SQL in your code but are much more direct wrappers over things you can do in SQL.Data Mapper also lets you keep various niceties such as generating migration code, that stem from having your table definition as objects.Use Data Mapper ORMs if you want to use an ORM.

评论 #43994600 未加载

评论 #43994078 未加载

optician_owl3 days ago

It's an awkward article. To answer why a query is slow you need a bit more details than just the query. Also, I reread about timeouts and didn't get it, what was the database, whether it was a database issue, how it was related to migration.The only information I could extract was that the company made bad architectural decisions, believes in ORM (looking at the queries, there are many doubts that the data layouts in DB are adequate) and cannot clearly explain situations. But this is only interesting to their candidates or investors.It may sound rude, so I apologise.

hobs4 days ago

It still makes me sad when half the queries I see are json_* - I know its far too late, but a big sad trombone in query performance is constantly left joining to planner queries that are going to give you 100 rows as an estimate forever.

评论 #43990609 未加载

评论 #43990412 未加载

评论 #43990199 未加载

compton934 days ago

I'm curious about Motion's experience with "Unused Indices". They suggest Cockroach's dashboard listed used indexes in the "Unused Indices" list.I think the indexes they suspect were used are unused but Motion didn't realize CockroachDB was doing zigzag joins on other indexes to accomplish the same thing, leaving the indexes that would be obviously used as genuinely not used.It's a great feature but CRDB's optimizer would prefer a zig zag join over a covering index, getting around this required indexes be written in a way to persuade the optimizer to not plan for a zig zag join.

from-nibly4 days ago

Feels like postgres is always the answer. I mean like there's gotta be some edge case somewhere where postgres just can't begin to compete with other more specialized database but I'd think that going from postgres to something else is much easier than the other way around.

评论 #43992361 未加载

评论 #43990814 未加载

评论 #43991455 未加载

评论 #43991372 未加载

评论 #43991028 未加载

moonikakiss4 days ago

great blog. It seems like you might benefit from columnar storage in Postgres for that slow query that took ~20seconds.It's interesting that people typically think of columnstores for strict BI / analytics. But there are so many App / user-facing workloads that actually need it.ps: we're working on pg_mooncake v0.2. create a columnstore in Postgres that's always consistent with your OLTP tables.It might help for this workload.

评论 #43991025 未加载

评论 #43990627 未加载

sgarland4 days ago

It is forever enraging to me that ORMs turn SELECT * into each individual column, mostly because people then post the whole thing and it’s obnoxiously large.Similarly maddening, the appalling lack of normalization that is simply taken for granted. “It’s faster, bro.” No, no, it is not. Especially not at the hundreds of millions or billions of rows scale. If you store something low-cardinality like a status column, with an average length of perhaps 7 characters, that’s 8 bytes (1 byte overhead assumed, but it could be 2). Multiply that by 2 billion rows, and you’re wasting 16 GB. Disk is cheap, but a. Memory isn’t b. Don’t be lazy. There’s a right way to use an RDBMS, and a wrong way. If you want a KV store, use a damn KV store.Finally, I’d be remiss if I failed to point out that Prisma is an unbelievably immature organization who launched without the ability to do JOINS [0]. They are forever dead to me for that. This isn’t “move fast and break things,” it’s “move fast despite having zero clue what we’re doing but convince JS devs that we do.”[0]: <a href="https://github.com/prisma/prisma/discussions/19748">https://github.com/prisma/prisma/discussions/19748</a>

评论 #43990398 未加载

评论 #43990447 未加载

评论 #43990535 未加载

评论 #43990379 未加载

评论 #43990428 未加载

评论 #43990445 未加载

frollogaston4 days ago

Did I miss something, or does the article not mention anything about sharding in Postgres? Was that just not needed?Also, query planner maturity is a big deal. It's hard to get Spanner to use the indexes you want.

评论 #43990735 未加载

crowcroft3 days ago

Why does Postgres get so much love, and MySQL/MariaDB get nothing?I'm assuming it's largely because Postgres has more momentum, and is much more extensible, but if you're just trying to do 'boring DB stuff' I find it's faster for most use cases. Development has slowed, but it would be hard to argue that it's not battle tested and robust.

评论 #43994830 未加载

评论 #43994618 未加载

评论 #43994620 未加载

评论 #43994801 未加载

Inviz4 days ago

WHERE CONDITION AND 1=1 results in scanning whole table? I dont think so...

CoolCold3 days ago

> First, the support portal is a totally different website that doesn’t share auth with the main portal. Second, you have to re-input a lot of data they already know about you (cluster ID, etc). And by the time they respond it’s typically been a week.I was about to ask what was main constraint for CockroachDB like iostats and atop info for CPU/disk drives, but realized that is probably something offloaded to some SaaS - so still curious

mmiao4 days ago

a 100 million rows table is fairly small and you just don't need a distributed database. but you will need one if you hit 10 billion rows

评论 #43991468 未加载

评论 #43996168 未加载

coolcase4 days ago

Why not optimise the bad queries first?Aside. Job section says not 9-5. What does that mean? Long hours? Or not 9-5 attitude?

评论 #43990897 未加载

vivzkestrel4 days ago

did you try using the native pg library or postgres or pg-promise library and scrap the ORM completely to see what effect it has? If you are looking explicitly for migrations, you can simply use node-pg-migrate <a href="https://www.npmjs.com/package/node-pg-migrate" rel="nofollow">https://www.npmjs.com/package/node-pg-migrate</a> and scrap the rest of all the FLUFF that ORMs come with. ORMs in general are horribly bloated and their performance for anything more than select from table where name = $1 is very questionable

niwtsol4 days ago

That was an interesting read, seemed like an overwhelming amount of data for why they should move off cockroach. All of my db work has been read heavy and I’ve never had a need for super fast multi-region writes. Is a multi-region write architecture possible in Postgres? I’m trying to understand if GDPR was the requirement that resulted in cockroach or if the lackluster multi region write was the bigger driver.

评论 #43990356 未加载

JackSlateur3 days ago

<pre><code> there were concerns about the eventuality of a multi-region setup (mandated by GDPR) </code></pre> Can anyone share more details about this ? The GDPR is mandating a multi-region setup ? This sounds very wild

badmonster3 days ago

indeed interesting

liampulles3 days ago

I wonder increasingly with tools like ChatGPT whether ORMs make sense anymore? The argument I've always heard for ORMs is that they make it quick and easy to make the initial repository method, and that they make migrating to a new DB easier (hypothetically).It seems to me LLMs now help fill both those roles, but with the benefit that you can now tune the SQL as needed. I also always prefer actually knowing exactly what queries are going to my DB, but YMMV.