The DynamoDB Paper

256 点作者 krnaveen14将近 3 年前

15 条评论

mabbo将近 3 年前

Rick Houlihan did a talk a few years ago about designing the data later for an application using dynamodb. The most common reaction I get from people I show it to- most of them Amazon SDEs who operate services that use Dynamodb- is "Holy shit what is this wizardry?!"<a href="https://youtu.be/HaEPXoXVf2k" rel="nofollow">https://youtu.be/HaEPXoXVf2k</a>One of the biggest mistakes people make with dynamo is thinking that it's just a relational database with no relations. It's not.It's an incredible system, but it requires a lot of deep knowledge to get the full benefits, and it requires you, often, to design your data layer very well up-front. I actually don't recommend using it for a system that hasn't mostly stabilized in design.But when used right, it's an incredibly performant beast of a data store.

评论 #32097252 未加载

评论 #32095856 未加载

评论 #32095702 未加载

评论 #32096098 未加载

评论 #32099594 未加载

评论 #32101588 未加载

评论 #32096146 未加载

评论 #32110486 未加载

评论 #32101191 未加载

bistablesulphur将近 3 年前

I'be been working with DynamoDB daily for a few years now, and whilst I like working with it and the specific scenario it solves for us, I'd still urge anyone thinking about using it to carefully reconsider whether their problem is truly unique enough that a traditional RDBMS couldn't handle it with some tuning. Theycan be unbelievably performant and give so much stuff for free.Designing application specifically for DynamoDB will take _a lot_ of time and effort. I think we could have saved almost a third of our entire development time had we used more of the boring stuff.

评论 #32096167 未加载

评论 #32101889 未加载

评论 #32099910 未加载

评论 #32094804 未加载

评论 #32099053 未加载

评论 #32099085 未加载

评论 #32094758 未加载

评论 #32094744 未加载

评论 #32095401 未加载

评论 #32094913 未加载

评论 #32102167 未加载

评论 #32095264 未加载

ignoramous将近 3 年前

> From the paper [0]: DynamoDB consists of tens of microservices.Ha! For folks who think two-pizza teams mean 100s of microservices... this is probably the second most scaled-out storage service at AWS (behind S3?), and it runs tens of microservices (pretty sure these aren't micro the way most folks would presume 'em to be).> What's exciting for me about this paper is that it covers DynamoDB's journey...Assuming these comments are true [1][2], in a classic Amazon fashion [3], the paper fails to acknowledge a FOSS database (once?) underneath it: MySQL/InnoDB (and references it as B-Tree instead).[0] <a href="https://web.archive.org/web/20220712155558/https://www.usenix.org/system/files/atc22-vig.pdf" rel="nofollow">https://web.archive.org/web/20220712155558/https://www.useni...</a>[1] <a href="https://news.ycombinator.com/item?id=13173927" rel="nofollow">https://news.ycombinator.com/item?id=13173927</a>[2] <a href="https://news.ycombinator.com/item?id=18871854" rel="nofollow">https://news.ycombinator.com/item?id=18871854</a>[3] <a href="https://archive.is/T1ZNJ" rel="nofollow">https://archive.is/T1ZNJ</a>

评论 #32102090 未加载

评论 #32099733 未加载

ctvo将近 3 年前

I've found DDB to be exceptional for use cases where eventual consistency is OK and you have a few well defined query patterns. This is a large number of use cases so it's not too limiting. As the number of query patterns grow, indices grow, and costs grow (or pray for your soul you attempt to use DDB transactions to write multiple keys to support differing query patterns). If you need strong consistency, your cost and latency also increases.Oh, and I'd avoid DAX. Write your own cache layer. The query cache vs. item cache separation[1] in DAX is a giant footgun. It's also very under supported. There still isn't a DAX client for AWS SDK v2 in Go for example[2].1 - <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.consistency.html#DAX.consistency.query-cache" rel="nofollow">https://docs.aws.amazon.com/amazondynamodb/latest/developerg...</a>2 - <a href="https://github.com/aws/aws-dax-go/issues/2" rel="nofollow">https://github.com/aws/aws-dax-go/issues/2</a>

sudhirj将近 3 年前

The way that I learnt the ins and outs of DynamoDB (and there is a lot to learn if you want to use it effectively) is by implementing all the Redis data structures and commands on it. That helped understand both systems in one shot.The key concept in Dynamo is that you use a partition key on all your bits of data (my mental model is that you get one server per partition) and you then can arrange data using a sort key in that partition. You can then range/inequality query over the sort keys. That’s the gist of it.The power and scalability comes from the fact that each partition can be individually allocated and scaled, so as long as you spread over partitions you have practically no limits.And you can do quite a bit with that sort key range/inequality thing. I was pleasantly surprised by how much of Redis I could implement: <a href="https://github.com/dbProjectRED/redimo.go" rel="nofollow">https://github.com/dbProjectRED/redimo.go</a>

nathas将近 3 年前

Nice write-up from Marc. This definitely hits on the most common problems distributed systems face. I haven't read the paper yet but it is pretty cool they published this and talk about changes over time.1. Managing 'heat' in the system (or assuming that you'll have an uniform distribution of requests)2. Recovering a distributed system from a cold state and what that implies for your caches.3. The obvious one that people that do this type of thing spend a lot of time thinking about: CAP theorem shenanigans and using Paxos.Reminds me of the Grugbrained developer on microservices: <a href="https://grugbrain.dev/#grug-on-microservices" rel="nofollow">https://grugbrain.dev/#grug-on-microservices</a>Good luck getting every piece working on the first major recovery. My 100% unscientific hunch is that most folks aren't testing their cold state recovery from a big failure, much how folks don't test their database restoration solutions (or historically haven't).

Patrol8394将近 3 年前

These days I’d probable take a closer look at spanner. It is a consistent and scalable db. It makes life much easier for developers.Like Cassandra, dynamodb requires the data model to be designed very carefully to be able to get the max out of them.More often than not, that simply adds more complexity; people often underestimate how much a sharded mysql/Postgres can scale.My default choice for the longest time: Postgres for the data I care about, ES as secondary index and S3 as blob storage.

评论 #32096677 未加载

评论 #32096982 未加载

revicon将近 3 年前

One big benefit of DynamoDB over RDS on AWS is that the access layer is API based so you don’t have issues with held open connections when accessing via AWS Lambda.

评论 #32102020 未加载

ledauphin将近 3 年前

An underrated part of DynamoDB are its streams. You can subscribe to changes and reliably process those in a distributed way. If you're comfortable with the terms "at-least once delivery" and "eventual consistency", you can build some truly amazing systems by letting events propagate reactively through your system, never touching a data store or messaging broker other than DynamoDB itself.It's not for everyone, but when you get a team up and running with it, it can be shockingly powerful.

评论 #32100672 未加载

0xthrowaway将近 3 年前

DynamoDB is (edit: can be) extremely expensive compared to alternatives (e.g. self hosted SQL).Make sure the benifits (performance, managed, scale) outweigh the costs!

评论 #32106120 未加载

kumarvvr将近 3 年前

A word of caution. The default limit for number of tables per AWS account, for DynamoDB is 2500.Tables are a scarce resource and you want to use single table designs for each app.The design of tables with DDB is fascinating. Once you understand the PK / SK / GSI dance, design becomes so intuitive.

ruoranwang将近 3 年前

I wonder how's Cassandra doing? I heard companies are migrating away from it.

dboreham将近 3 年前

I'd like to learn more about their MemDS. Afaik nothing has been made public.

no_wizard将近 3 年前

How well does DyanmoDB scale when paired with AppSync and GraphQL? The selling point here being you can use GQL as your schema for the DB too and get automatic APIs for free

评论 #32098686 未加载

评论 #32096487 未加载

jerryjerryjerry将近 3 年前

Good job! But I'm wondering when Amazon can start to contribute to open source world...

评论 #32098787 未加载