Migrating Uber's ledger data from DynamoDB to LedgerStore

328 pointsby gronky_12 months ago

34 comments

I wonder if 1.7 petabytes of data (1T indexed records) could fit on a single (very) beefy baremetal server for under a couple thousand dollars a month, served by SQLite.Like this: <a href="https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server" rel="nofollow">https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...</a>

评论 #40416676 未加载

评论 #40414204 未加载

评论 #40414934 未加载

评论 #40414282 未加载

评论 #40414215 未加载

评论 #40417996 未加载

评论 #40415168 未加载

评论 #40414225 未加载

评论 #40415375 未加载

评论 #40415448 未加载

评论 #40414230 未加载

评论 #40414243 未加载

SkyMarshal12 months ago

It seems LedgerStore is not open source [1], and finding any info on it requires following a trail of backlinked Uber blog posts. Here's one with the most info on LedgerStore that I can find, from 2021:<a href="https://www.uber.com/en-US/blog/dynamodb-to-docstore-migration/" rel="nofollow">https://www.uber.com/en-US/blog/dynamodb-to-docstore-migrati...</a>[1]:<a href="https://github.com/uber">https://github.com/uber</a>

评论 #40418333 未加载

yazaddaruvala12 months ago

Reading the article it’s clear pretty quickly that Uber was using DynamoDB poorly.It seems they need strong consistency for certain CUJs and then a lot of data warehousing for historical transactions.It’s strange to me that they didn’t first convert their 2 table DynamoDB architecture into DynamoDB and Redshift architecture or similar. This is a pretty common pattern.

评论 #40421767 未加载

评论 #40421399 未加载

debarshri12 months ago

There was an era around 2015, when all the cool tech companies like netflix, spotify, soundcloud, uber and others were building alot of infrastructure and database tools. Nowadays, engineers often talk in AWS/Cloud terminologies.It is breathe of fresh air to see that orgs are still building tools like that.

alexey-salmin12 months ago

I don't know about economics of this particular project but damn dynamodb is expensive. At some point I was thinking that everyone else was just using it wrong, doing scans and queries instead of point-wise lookups into pre-computed tables.It turns out however that even when you use it as a distributed hashtable you still pay a huge premium.

评论 #40428839 未加载

theanirudh12 months ago

I wonder if they considered <a href="https://tigerbeetle.com" rel="nofollow">https://tigerbeetle.com</a>

评论 #40415764 未加载

sha_r_roh12 months ago

Congrats to anyone who worked on it! However, I'm guessing the cost of just running this team be quite large and not significantly different from the savings (6M), and add on top of it the overhead of maintenance. Payments would not likely be a long-term bet as well, so kind of interesting why teams take up such projects ? Is it some kind of sunk-cost with the engineering teams you already have?

评论 #40414267 未加载

评论 #40414105 未加载

评论 #40414431 未加载

评论 #40414185 未加载

评论 #40415559 未加载

评论 #40419626 未加载

评论 #40414745 未加载

评论 #40414548 未加载

评论 #40414095 未加载

xiwenc12 months ago

Is this another outlier when you reach certain scale, it’s more beneficial to roll your own? Pretty amazing what Uber has to deal with.Also it’s not very clear from the original articles, what is the new total “cost of ownership” of this new refactored service. Like now they need to manage their own databases and the storage backing them. Or did i miss it?

评论 #40414395 未加载

PeterZaitsev12 months ago

I think this is fantastic illustration of how expensive proprietary cloud based data stores can be... and what it is feasible to migrate from them to something else.

citizenpaul12 months ago

I think there is some reckoning of cloud service providers coming(assuming logical actors...). I was doing some contract work for a small place that had a GCP Bigtable that was costing $11k+ per month for some reports that were based on data from a 375MB !!! mysql db into big-table for the reports to run.They hired some out of school data scientist to do reports and they were doing crazy ineffective things with the tiny dataset. Wanted me to fix it for pennies tomorrow and I declined.

评论 #40421097 未加载

qwertyuiop_12 months ago

Assuming there are a minimum of two teams a total of 20 maintaining this in-house software, I gave 250k as cost per engineer (salary plus health and other benefit costs to the company). Thats $5 million right there. I am estimating lowest range. Thats why Amazon calls these efforts undifferentiated heavy lifting. is there a slight premium to pay than rolling your own and maintaining yes. Its worth all the trouble and security and management overhead into rolling your own.

ForHackernews12 months ago

Does no one ever delete data? It's hard to believe there's much business value in keeping every individual payment record dating back to 2017.

评论 #40414447 未加载

评论 #40414401 未加载

评论 #40414354 未加载

评论 #40414299 未加载

评论 #40414668 未加载

influx12 months ago

I would gladly pay 6 million/year to not be on call, and have to worry about things like bios and ssd firmware ever again.

评论 #40445952 未加载

评论 #40415694 未加载

评论 #40421107 未加载

评论 #40422201 未加载

rmccue12 months ago

Original story looks to be <a href="https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-ledgerstore/" rel="nofollow">https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...</a>

washywashy12 months ago

I pretty much never see engineering salaries factored into these types of savings projects. I assume because engineers are already viewed as a sunk cost or maybe it’s just because it’s way less tangible. Have seen many designs describe how X saves Y dollars but ignores the engineering effort to maintain and build it. Half the time I suspect it’s just so people have something to work on, rather than it being some critical fix.

评论 #40415561 未加载

评论 #40414656 未加载

评论 #40414635 未加载

评论 #40415626 未加载

评论 #40415774 未加载

评论 #40422070 未加载

评论 #40417076 未加载

评论 #40415742 未加载

drexlspivey12 months ago

> Uber migrated all its payment transaction data from DynamoDB and blob storage into a new long-term solutionNo way they have 1 trillion transactions right?

评论 #40414049 未加载

评论 #40414062 未加载

otterley12 months ago

Does anyone know whether Uber considered Amazon QLDB for the implementation? Seems like it might have been a good fit, at first blush.

geodel12 months ago

More power to them. At this point even technically decent teams/companies have given up on developing large, complex systems in favor of SaaS. After carefully evaluating our strategic course of action answer always is AWS.Its only team who propose alternative they have to justify rigorously how come they differ in conclusion.

评论 #40415672 未加载

评论 #40415553 未加载

评论 #40416628 未加载

ramesh3112 months ago

Another victim of the "Great Normalization", i.e. that entire generation of garbage tech debt generated during the 2010s that was built on NoSQL stores that never should have been, is now coming due. You could probably make an entire consulting business out of migrating these things to MySQL.

评论 #40428389 未加载

评论 #40420501 未加载

评论 #40417058 未加载

deadbabe12 months ago

So did the engineers who proposed this get some kind of bonus considering how much money they saved the company?

评论 #40414170 未加载

评论 #40414173 未加载

评论 #40414411 未加载

评论 #40414442 未加载

jcims12 months ago

Every time I've ever used DynamoDB it cost way more than I would have ever expected.

评论 #40419597 未加载

rguillebert12 months ago

So they saved $0.000006 per record, it's really about the little things...

awinter-py12 months ago

they do 2 billion rides per quarterwhat are the 'trillions' here?^ also that translates to ~1000 transactions per second with some assumptions; have never understood why they care so much about infra scaling1000 tps is like 1 box

foota12 months ago

The article states that they already had an in house solution for cold data, so one of the benefits they claim is simplifying by moving to one system for both hot and cold data.

ledgerdev12 months ago

Say you wanted to build an app on a database like LedgerStore but at much smaller scale, what are the best open source options out there right now?

评论 #40416081 未加载

augunrik12 months ago

Is there some information on why they need to store this much data for immediate retrieval? And why is it so much?

benterix12 months ago

I read the article so I roughly know what LedgerStore is - but I have no idea where it is hosted.

评论 #40414341 未加载

benced12 months ago

$6M... isn't that much?

boringg12 months ago

How much did the migration effort cost?

drpotato12 months ago

The original[1][2] articles are a better read IMO. The link is just a summary of the two with added spelling and grammatical errors that materially impact the meaning.1. <a href="https://www.uber.com/blog/how-ledgerstore-supports-trillions-of-indexes" rel="nofollow">https://www.uber.com/blog/how-ledgerstore-supports-trillions...</a>2. <a href="https://www.uber.com/blog/migrating-from-dynamodb-to-ledgerstore/" rel="nofollow">https://www.uber.com/blog/migrating-from-dynamodb-to-ledgers...</a>

评论 #40418253 未加载

评论 #40414280 未加载

xyst12 months ago

Uber must have picked up some Google rejects. This type of homegrown project was seen at Google all the time.Usually to aim for a significant promotion.“Designed and built homegrown system to save $Xm! Give me promo, bro?”Just so happened to ignore that it took X+Y additional to build. Also it will probably be going to the G graveyard in a few years.

评论 #40419304 未加载

评论 #40419310 未加载

评论 #40422718 未加载

评论 #40417535 未加载

评论 #40416930 未加载

评论 #40609794 未加载

bjornsing12 months ago

I’m working on a specialized data store[1] that would be perfect for this kind of use case (large “cold” storage with indexing). But I’m having trouble finding potential customers. I’ve tried Google search ads but got 99% spam and 1% potential investors, but 0% potential customers. If anybody has any ideas I’m all ears.1. <a href="https://www.haystackdb.dev/" rel="nofollow">https://www.haystackdb.dev/</a>

评论 #40415759 未加载

评论 #40414126 未加载

评论 #40414150 未加载

评论 #40414214 未加载

评论 #40415035 未加载

评论 #40416602 未加载

评论 #40416429 未加载

评论 #40414396 未加载

评论 #40414315 未加载

评论 #40414139 未加载

评论 #40416338 未加载

评论 #40416103 未加载

评论 #40415400 未加载

评论 #40414367 未加载

Antony9080712 months ago

Wow crazy amount of work went into this. Well done

pojzon12 months ago

You look at stuff like that and think about„how much talent is wasted on pointless things that help noone in the world while getting paid heaps for nothing”We could accomplish everything if ppl stopped wasting time on pointless tasks.