I wonder if 1.7 petabytes of data (1T indexed records) could fit on a single (very) beefy baremetal server for under a couple thousand dollars a month, served by SQLite.<p>Like this: <a href="https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server" rel="nofollow">https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...</a>
It seems LedgerStore is not open source [1], and finding any info on it requires following a trail of backlinked Uber blog posts. Here's one with the most info on LedgerStore that I can find, from 2021:<p><a href="https://www.uber.com/en-US/blog/dynamodb-to-docstore-migration/" rel="nofollow">https://www.uber.com/en-US/blog/dynamodb-to-docstore-migrati...</a><p>[1]:<a href="https://github.com/uber">https://github.com/uber</a>
Reading the article it’s clear pretty quickly that Uber was using DynamoDB poorly.<p>It seems they need strong consistency for certain CUJs and then a lot of data warehousing for historical transactions.<p>It’s strange to me that they didn’t first convert their 2 table DynamoDB architecture into DynamoDB and Redshift architecture or similar. This is a pretty common pattern.
There was an era around 2015, when all the cool tech companies like netflix, spotify, soundcloud, uber and others were building alot of infrastructure and database tools. Nowadays, engineers often talk in AWS/Cloud terminologies.<p>It is breathe of fresh air to see that orgs are still building tools like that.
I don't know about economics of this particular project but damn dynamodb is expensive. At some point I was thinking that everyone else was just using it wrong, doing scans and queries instead of point-wise lookups into pre-computed tables.<p>It turns out however that even when you use it as a distributed hashtable you still pay a huge premium.
Congrats to anyone who worked on it! However, I'm guessing the cost of just running this team be quite large and not significantly different from the savings (6M), and add on top of it the overhead of maintenance. Payments would not likely be a long-term bet as well, so kind of interesting why teams take up such projects ? Is it some kind of sunk-cost with the engineering teams you already have?
Is this another outlier when you reach certain scale, it’s more beneficial to roll your own? Pretty amazing what Uber has to deal with.<p>Also it’s not very clear from the original articles, what is the new total “cost of ownership” of this new refactored service. Like now they need to manage their own databases and the storage backing them. Or did i miss it?
I think this is fantastic illustration of how expensive proprietary cloud based data stores can be... and what it is feasible to migrate from them to something else.
I think there is some reckoning of cloud service providers coming(assuming logical actors...). I was doing some contract work for a small place that had a GCP Bigtable that was costing $11k+ per month for some reports that were based on data from a 375MB !!! mysql db into big-table for the reports to run.<p>They hired some out of school data scientist to do reports and they were doing crazy ineffective things with the tiny dataset. Wanted me to fix it for pennies tomorrow and I declined.
Assuming there are a minimum of two teams a total of 20 maintaining this in-house software, I gave 250k as cost per engineer (salary plus health and other benefit costs to the company). Thats $5 million right there. I am estimating lowest range. Thats why Amazon calls these efforts undifferentiated heavy lifting. is there a slight premium to pay than rolling your own and maintaining yes. Its worth all the trouble and security and management overhead into rolling your own.
Does no one ever delete data? It's hard to believe there's much business value in keeping every individual payment record dating back to 2017.
Original story looks to be <a href="https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-ledgerstore/" rel="nofollow">https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...</a>
I pretty much never see engineering salaries factored into these types of savings projects. I assume because engineers are already viewed as a sunk cost or maybe it’s just because it’s way less tangible. Have seen many designs describe how X saves Y dollars but ignores the engineering effort to maintain and build it. Half the time I suspect it’s just so people have something to work on, rather than it being some critical fix.
> Uber migrated all its payment transaction data from DynamoDB and blob storage into a new long-term solution<p>No way they have 1 trillion transactions right?
More power to them. At this point even technically decent teams/companies have given up on developing large, complex systems in favor of SaaS. After <i>carefully evaluating our strategic course of action</i> answer always is AWS.<p>Its only team who propose alternative they have to justify rigorously how come they differ in conclusion.
Another victim of the "Great Normalization", i.e. that entire generation of garbage tech debt generated during the 2010s that was built on NoSQL stores that never should have been, is now coming due. You could probably make an entire consulting business out of migrating these things to MySQL.
they do 2 billion rides per quarter<p>what are the 'trillions' here?<p>^ also that translates to ~1000 transactions per second with some assumptions; have never understood why they care so much about infra scaling<p>1000 tps is like 1 box
The article states that they already had an in house solution for cold data, so one of the benefits they claim is simplifying by moving to one system for both hot and cold data.
The original[1][2] articles are a better read IMO. The link is just a summary of the two with added spelling and grammatical errors that materially impact the meaning.<p>1. <a href="https://www.uber.com/blog/how-ledgerstore-supports-trillions-of-indexes" rel="nofollow">https://www.uber.com/blog/how-ledgerstore-supports-trillions...</a><p>2. <a href="https://www.uber.com/blog/migrating-from-dynamodb-to-ledgerstore/" rel="nofollow">https://www.uber.com/blog/migrating-from-dynamodb-to-ledgers...</a>
Uber must have picked up some Google rejects. This type of homegrown project was seen at Google all the time.<p>Usually to aim for a significant promotion.<p>“Designed and built homegrown system to save $Xm! Give me promo, bro?”<p>Just so happened to ignore that it took X+Y additional to build. Also it will probably be going to the G graveyard in a few years.
I’m working on a specialized data store[1] that would be perfect for this kind of use case (large “cold” storage with indexing). But I’m having trouble finding potential customers. I’ve tried Google search ads but got 99% spam and 1% potential investors, but 0% potential customers. If anybody has any ideas I’m all ears.<p>1. <a href="https://www.haystackdb.dev/" rel="nofollow">https://www.haystackdb.dev/</a>
You look at stuff like that and think about<p>„how much talent is wasted on pointless things that help noone in the world while getting paid heaps for nothing”<p>We could accomplish everything if ppl stopped wasting time on pointless tasks.