There are very few suitable use cases for DynamoDB

232 点作者 vgt将近 8 年前

37 条评论

brandur将近 8 年前

I've seen two large usages of DynamoDB at two different companies, and for what it's worth, in both cases we've had similar trouble as the author. In one case we ended up ripping it out and moving to a sharded Postgres scheme, and in the other we've left in place for now because a migration will be such a monumental effort, but it's pretty much universally maligned.Fundamentally, the problem seems to be that choosing a partitioning key that's appropriate for DynamoDB's operational properties is ... unlikely. In their own docs on choosing a partition key [1] they use "user ID" as an example of one with good uniformity, but in reality if you choose something like that, you're probably about to be in for a world of pain: in many systems big users can be 7+ orders of magnitude bigger than small users, so what initially looked like a respectable partitioning key turns out to be very lopsided.As mentioned in the article, you can then try to increase throughput, but you won't have enough control over the newly provisioned capacity to really address the problem. You can massively overprovision, but then you're paying for a lot of capacity that's sitting idle, and even then sometimes it's not enough.Your best bet is probably to choose a partition key that's perfectly uniformly distributed (like a random ID), but at that point you're designing your product around DynamoDB rather than vice versa, and you should probably wonder why you're not looking at alternatives.---[1] <a href="http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload" rel="nofollow">http://docs.aws.amazon.com/amazondynamodb/latest/developergu...</a>

评论 #14723468 未加载

评论 #14724220 未加载

评论 #14724712 未加载

评论 #14723546 未加载

abalone将近 8 年前

So, Amazon actually kind of agrees! They talked about this very issue of hot keys and overprovisioning at their most recent conference. The thing is it was buried in a session on Aurora and they didn't mention DynamoDB by name -- they just called it nosql -- but they noted that a customer cut their costs 40% by moving to Aurora. Because it does automatic heat management and just bills you for what IO you use.This is somewhat at odds with their top-level messaging which still pushes DynamoDB as the most scalable solution. And perhaps it is... there are some scalability limits to Aurora. Writes are bottlenecked by one instance. 64TB max. I think performance drops when you exceed the in memory cache. But those limits are still quite large.Basically I sense some tension between the DynamoDB and Aurora teams and I wonder where this is all going to shake out in the long run.Here's the full quote (I transcribed it so may contain errors):"The one thing that surprised me is that there are some customers who are moving their nosql workload to aurora. There are two reasons for that. One, it’s a lot easier to use Aurora because it’s mysql compatible compared to nosql because the interfaces and transaction characteristics are so different. What is also interesting is people also saved money because the IO cost is much lower. In no SQL if you have a large table it gets partitioned then the IO gets partitioned across all the table partitions that you have. And if you have one partition that is hot then you have to provision based on the IO requirement of the hot partition. In the case of Aurora we do automatic heat management so we don’t have this hot parition issue. Second we don’t charge based on provisioned IO. It’s only the IO that you use. And that actually saves a lot of money. In this particular case this is a big social company, interaction company, I cannot tell the name, and they reduced their operational costs by 40% by moving from nosql to Aurora" [1][1] <a href="https://youtu.be/60QumD2QsF0?t=17m01s" rel="nofollow">https://youtu.be/60QumD2QsF0?t=17m01s</a>

windlep将近 8 年前

I don't know what the article author is storing, but it's noted that 10GB of data is stored per node. That's quite a bit of data for a single table, and the 10GB is per shard of a table, not per 'database' (DynamoDB only has a notion of tables).Amazon has deep dive talks on DynamoDB on YouTube[1] that go into lots of these details and how to avoid problems from them. It's not that different from understanding how to structure data for Cassandra, Mongo, etc. All the NoSQL systems require an understanding of how they work both to structure your data and ensure optimal performance.For example, maybe one's consistency constraints are better met by DynamoDB instead of BigTable's varying consistency on different query types[2] (the author of this article didn't address consistency at all). With DynamoDB, you can get strongly consistent reads and queries all the time [3].Overall, it seems like a kind of weak reason to make the strong statement about "probably shouldn't be using DynamoDB". Maybe a better title would be "Understanding drawbacks of large datasets in DynamoDB". I do hope the author understands the consistency changes they may experience in BigTable, as it could easily require large changes to the code-base if strong consistency was assumed on all queries.[1] <a href="https://www.youtube.com/watch?v=bCW3lhsJKfw" rel="nofollow">https://www.youtube.com/watch?v=bCW3lhsJKfw</a>[2] <a href="https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/" rel="nofollow">https://cloud.google.com/datastore/docs/articles/balancing-s...</a>[3] <a href="http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html" rel="nofollow">http://docs.aws.amazon.com/amazondynamodb/latest/developergu...</a>Edit: Fixed inconsistency in what the 10GB limit was referring to, not per table, but per node (shard) of the table.

评论 #14723926 未加载

评论 #14723548 未加载

评论 #14723554 未加载

cherioo将近 8 年前

"Your business has millions of customers and no single customer can do so many actions so quickly that the individual could create a hot key. Under this key you are storing around 2KB of data. ... Potentially getting 1–5 requests per second for a given second but certainly not a sustained load of that. ... This will not work at scale in DynamoDb."What? Why? Suppose that's 5 million customers, you will only have a 10GB table which fits in a single DynamoDB shard, with no sharding. With the restriction of 1-5 operation per customer per second, this sounds like the ideal use case for DynamoDB.What am I missing?

评论 #14723782 未加载

awfullyjohn将近 8 年前

He complains that DynamoDB doesn't work for him, then says you should instead use Google BigTable. But he doesn't offer evidence why you should use BigTable. And just says that it works for him.I don't buy it. I've used BigTable in the past and found it to be infuriating. Now, because it works for him, I'm supposed to believe that BigTable is right for me?

评论 #14723949 未加载

评论 #14723476 未加载

评论 #14723474 未加载

luhn将近 8 年前

Related is The Million Dollar Engineering Problem [1] from Segment, which showed up on HN a few months ago. They shaved $300K from their annual AWS spend by identifying and dealing with hot shards on DynamoDB.[1] <a href="https://segment.com/blog/the-million-dollar-eng-problem/" rel="nofollow">https://segment.com/blog/the-million-dollar-eng-problem/</a>

karmakaze将近 8 年前

TL;DR - Don't use something as coarse as customer_id as a partition key. Alternatively move to GCP/BigTable.Any DynamoDB tuning advice will say how important it is to have well distributed hash keys. As for the second part, why not use Cloud Spanner? I wish AWS had something like it.

评论 #14724063 未加载

评论 #14723813 未加载

qaq将近 8 年前

I can see the future:article that outlines gotchas of Bigtablearticle you probably shouldn’t use Bigtablearticle the amount of money we could've saved using PG and not rewriting things 3 times.anything up to 10-15TB there are very few reasons not to use something like PG

评论 #14724615 未加载

TheAceOfHearts将近 8 年前

Many of the comments here are saying that the author's use-case wasn't a good one for DynamoDB. Can anyone share some simple approachable resources that talk about the kinds of use-cases where these tools make sense?Whenever I read about NoSQL systems, I'm always left a unsure about its use-cases. I've only worked on systems where a traditional RDBMS made the most sense. How do you identify when it's appropriate to reach for one of the many NoSQL tools?I've had the suspicion that many applications that leverage NoSQL tools are usually used in conjunction with a relational database, and not in isolation. Based on my limited understanding, I can at least wrap my head around a few ways in which this could probably help. Am I off the mark here? One of the points I struggle with is that once you're storing data across multiple data stores, maintaining data integrity becomes much harder.

评论 #14724391 未加载

评论 #14725521 未加载

评论 #14723844 未加载

chairmanwow将近 8 年前

Quick aside:It seems that a lot of the qualms with various databases stem from a misunderstanding of their use cases. A lot of the features of major SQL databases, namely ACID, are misappropriated to be features of databases in general. This key misunderstanding seems to cause a lot of SWEs insane as they later realize that NoSQL DBs are not always Available and Strongly Consistent.Responding to Author:I wasn't totally convinced by the author's argument against DynamoDB. This article [1] offers a good solution to pretty much all of OP's problems. Most significantly, hashing user data using date.While DynamoDB is certainly different from most other databases, that doesn't mean that there aren't sensical usage methodologies.Links:[1] <a href="https://medium.com/building-timehop/one-year-of-dynamodb-at-timehop-f761d9fe5fa1" rel="nofollow">https://medium.com/building-timehop/one-year-of-dynamodb-at-...</a>

sheeshkebab将近 8 年前

Dynamodb has a nice js interface and is a good helper store for various small aws specific automation projects.It's not a good choice for various read heavy enterprise apps (and frankly for write ones neither). It also doesn't scale - it doesn't even have a cross region active/active support (highly surprising for a key value store when cassandra supported it for ages).Don't use it for anything serious.

xendo将近 8 年前

I don't see how the scenario with a single customer ID is credible. Why not just put a cache in front of the db? Now there is even a fully managed solution for that: <a href="https://aws.amazon.com/dynamodb/dax/" rel="nofollow">https://aws.amazon.com/dynamodb/dax/</a>

ezulich将近 8 年前

Full disclosure: I am a member of DynamoDB team at AWS, but my opinions here are mine onlyOn the topic of DynamoDB use cases, here are some of DynamoDB users describing their use cases:- DataXu: "How DataXu scaled its System to handle billions of events with DynamoDB" -- <a href="https://youtu.be/lDiI0JMf_yQ?t=765" rel="nofollow">https://youtu.be/lDiI0JMf_yQ?t=765</a> (from AWS re:Invent 2016)- Lyft: “Lyft Easily Scales Up its Ride Location Tracking System with Amazon DynamoDB” -- <a href="https://www.youtube.com/watch?v=WlTbaPXj-jc" rel="nofollow">https://www.youtube.com/watch?v=WlTbaPXj-jc</a>. The story "AWS Helps Fuel Lyft's Expansion" in InformationWeek mentions Lyft's use of DynamoDB as well: <a href="http://www.informationweek.com/cloud/infrastructure-as-a-service/aws-helps-fuel-lyfts-expansion-/d/d-id/1326660" rel="nofollow">http://www.informationweek.com/cloud/infrastructure-as-a-ser...</a>- Under Armour implemented cross-region replication using DynamoDB: <a href="https://youtu.be/NtaTC2Fq7Wo?t=699" rel="nofollow">https://youtu.be/NtaTC2Fq7Wo?t=699</a>- Amazon.com: Amazon Marketplace shares their story of migration to DynamoDB and why they did it: <a href="https://www.youtube.com/watch?v=gllNauRR8GM" rel="nofollow">https://www.youtube.com/watch?v=gllNauRR8GM</a>Finally, DynamoDB was created because Amazon needed a highly reliable and scalable key/value database. As Werner Vogels said it in his blog post announcement of DynamoDB back in 2012 (<a href="http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html" rel="nofollow">http://www.allthingsdistributed.com/2012/01/amazon-dynamodb....</a>):"This non-relational, or NoSQL, database was targeted at use cases that were core to the Amazon ecommerce operation, such as the shopping cart and session service."

natekupp将近 8 年前

We use DynamoDB quite a bit at Thumbtack. Our biggest issue is backups - just wrote a short note about our experiences with DynamoDB here: <a href="https://medium.com/@natekupp/dynamodb-and-backups-16dba0dbcded" rel="nofollow">https://medium.com/@natekupp/dynamodb-and-backups-16dba0dbcd...</a>

评论 #14724884 未加载

jsemrau将近 8 年前

When I started building my first app in 2011 MongoDB was the rage. So I build the back-end using the futuristic 'No-SQL' technology. It turned out to be slow (~1 min Query time), inconsistent, and missing an RDBMS layer. Move the thing to PHP/Mysql problems were gone. I still have not found a use case outside of web (comments/discussion) sites where the high integration with Javascript actually makes sense.

评论 #14723744 未加载

评论 #14723702 未加载

评论 #14723200 未加载

avitzurel将近 8 年前

The title is very... what's the two words I'm looking for?Dynamo does not scale for this specific use case but I have used it successfully in production (at scale) with ZERO issues.Dynamo is a key value sharded and zero operations* database that most applications and companies will benefit from IMHO.is the hot key and evenly sending queries to nodes the only issues you concluded we should not use DynamoDB on?

nikanj将近 8 年前

The gist of this seems to be that DynamoDB becomes a problem if you have millions of customers.Don't worry. You don't. And there will be many good reasons to refactor the architecture before you do.

评论 #14724087 未加载

评论 #14723340 未加载

评论 #14723433 未加载

dserban将近 8 年前

My background is in Cassandra and one company I worked for last year insisted that we use DynamoDB for a project.Here are a few things that ended up being show stoppers.1. Both the partition key and the sort key are capped at 1 field. In an attempt to "think Cassandra data model", the ugly workaround was to stringify and concatenate things at the application layer, then parse / split on the other side. This made the code unreadable.2. DynamoDB-Spark integration is a second-class citizen. (Cassandra-Spark integration is first-class and well-maintained.)3. The other thing that made code unreadable was the accidental complexity introduced by exception handling / exponential backoff we needed to implement to protect against accidental read capacity underprovisioning.Although I made repeated pleas to switch to Cassandra, the (non-technical) CEO insisted that we keep using DynamoDB. I'm no longer at that company but I hear they have meanwhile switched to RedShift.

评论 #14725454 未加载

kevan将近 8 年前

>This will not work at scale in DynamoDb.I don't think we're getting the whole story from the author. I'm not the biggest fan of Dynamo either for reasons I won't get into here, but this type of workload is exactly what Dynamo was built for: serving a website with millions of customers.Disclaimer: I work at Amazon, my views are my own.

评论 #14723684 未加载

dyeje将近 8 年前

Our team has been using it alongside Postgres as a scalable metric store and things have been pretty good. We had some growing pains tweaking our storage keys and switching to a daily table to avoid partitioning issues, but it's been quite stable for a while now.

hox将近 8 年前

The biggest issue faced with DynamoDB from my perspective has been the problem with any hosted service - that is, the operational stability is in the hands of others entrusted with operating a massive multi-tenant system, and any outage cannot follow your own operational recovery mechanisms unless you plan for failover yourself. And the moment you plan for failover, you need to evaluate why you don't just handle the primary system as well.

评论 #14723932 未加载

评论 #14723472 未加载

cyberferret将近 8 年前

Hmm... We dabbled with DynamoDB on a couple of very small projects, but found some real oddities in how it stores JSON data, and manages keys. Querying the dataset was also a mammoth, frustrating task.Ended up switching to RethinkDB and haven't looked back - far better query syntax/language (REQL) and we can organise the JSON content just how we want it.

jjirsa将近 8 年前

The best argument against dynamodb is the aws "well architected" guidelines - how are you designing for resiliency with your single region, active-passive database with clunky bolt on replication using kinesis?Cheaper active-active-active options exist that don't require manual dr failover drills and manual failback when regions inevitably crash.

macinjosh将近 8 年前

My company is in the online form builder space and usage of our MySQL RDS instances are nearing the limit of what is offered. I looked into moving certain high-volume/high-traffic data models to DynamoDB and realized we could not because each record is limited to a measly (for our needs) 400kb.

jey将近 8 年前

So it's fine as long as I precompute the full transaction to apply to the key, then do it all at once? I expect large overall transaction volumes but am happy to only issue one DB change per user interaction.

pishpash将近 8 年前

NoSQL should stay as a quick and dirty solution for storing key-values. When you start to schematize it and use it as a "DB," you are going down the wrong path. That is because NoSQL is a glorified cache, not a DB. It is essentially using the memory on a large number of nodes to buffer bursty throughput, and using background processes to collate the data later onto disk. There is almost no case where an explicit distributed caching or queuing solution backed by a traditional DB isn't strictly better.

评论 #14725757 未加载

didibus将近 8 年前

The article forgets a very important detail:> A single partition can hold approximately 10 GB of data, and can support a maximum of 3,000 read capacity units or 1,000 write capacity units.DynamoDb will also split your data if you provision more then 3000 reads or 1000 writes. And the caveat is that it will not join back the shards if you later reduce the throughput back down. Instead, each shard will just get even less throughput then you might believe.

评论 #14724046 未加载

banku_brougham将近 8 年前

The first part of the article sounds convincing, but i'm pretty sure amazon is using DynamoDB extensively and successfully at massive scales.

评论 #14723728 未加载

imauld将近 8 年前

We DynamoDB to store user alerts and it works fairly well for our use case. However we still run into issues with consumed throughput being well under provisioned but still getting throughput exceptions.We don't use the auto incremented user ID's either. We create a hash of the user ID and some of the other data contained in the alert and it would appear we still have hot partitions.

luhn将近 8 年前

I took a look at BigTable, as recommended by the article, because I was evaluating DynamoDB myself just yesterday. It looks like the minimum price for that is ~$1500 a month. Granted, you're getting what you pay for (3 nodes that support 10,000QPS each), but the pricing is out of reach for smaller projects.

评论 #14725403 未加载

datashovel将近 8 年前

Generally speaking. Given that DynamoDB is a NoSQL database service, I'm not certain that moving larger clients to their own dedicated AWS resources should cause too many negative side effects. Especially ones who are so large they're causing scalability issues.

rushi_agrawal将近 8 年前

If DynamoDB guys can come up with a mechanism similar to their CPU credits concept, that'd be a really nice feature to have. Of course it can't be as straightforward as CPU credits.

znep将近 8 年前

Does anyone have any real world experience with how the Amazon DynamoDB Accelerator that AWS released as a preview a few months ago can help out?

iampims将近 8 年前

rather poor example of "how dynamodb doesn't scale".DynamoDB is definitely not the silver bullet people hope it is but does work exceptionally well for what it was designed for.

ParadisoShlee将近 8 年前

Use case. Use case. Use case.

评论 #14740525 未加载

ZGF4将近 8 年前

There are a few misguided views in this article and in some of these comments.1. Every shardable database (Cassandra, Dynamo, BigTable) has to worry about hot spots. Picking a UUID as a partition key is only step one. What happens if one user is a huge majority of your traffic? All of their reads/writes are going to a single partition and of course you are going to suffer from performance issues from that hot spot. It becomes important to further break down your partition into synthetic shards or break up your data by time (only keep a day of data per shard). BigTable does not innately solve this, they may deal better with a large partition but it will inevitably become a problem.2. Some people are criticizing the choice of NoSQL citing the data size. Note you can have a small data size but have huge write traffic. An unsharded RDBMS will not scale well to this since you cannot distribute the writes across multiple nodes. Don't assume just because someone has a small data set they don't need to use NoSQL to deal with their volume

评论 #14725670 未加载

评论 #14724377 未加载

评论 #14724700 未加载

amygdyl将近 8 年前

I think the problem is that there are very few suitable length articles that are available for you at no cost, and the debate has been suffering from a lack of understanding in the widest sense of the problem and so little can be found on any website that tries to make a commercial subsistence which is any way better than you scan reading SO randomly the day before you go to present to management for their architectural planning meeting./s is I hope self evidentOnly seriously for my purposes, the use I get from the typical article on databases which I find is linked to from HN, is a reverse index to the better discussions, long after the discussion is off the front page here.I'm merely a little confused about the fact that widespread consternation of the quality of geek journalism for programmers is not much more than merely a occasional mention or moan.Genuinely is a good dose of cynicism in force, which is invisible to me?I understand that when I see a story about science intended for a general audience, then HN is likely to become home to much greater detail and depth in discussion.But with a subject line arguing generalised conclusion from a subject matter of database architecture?Sometimes I think that I'm confused about whether I'm supposed to be confused about the point of the TFA.

37 条评论

brandur将近 8 年前

评论 #14723468 未加载

评论 #14724220 未加载

评论 #14724712 未加载

评论 #14723546 未加载

abalone将近 8 年前

windlep将近 8 年前

评论 #14723926 未加载

评论 #14723548 未加载

评论 #14723554 未加载

cherioo将近 8 年前

评论 #14723782 未加载

awfullyjohn将近 8 年前

评论 #14723949 未加载

评论 #14723476 未加载

评论 #14723474 未加载

luhn将近 8 年前

karmakaze将近 8 年前

评论 #14724063 未加载

评论 #14723813 未加载

qaq将近 8 年前

评论 #14724615 未加载

TheAceOfHearts将近 8 年前

评论 #14724391 未加载

评论 #14725521 未加载

评论 #14723844 未加载

chairmanwow将近 8 年前

sheeshkebab将近 8 年前

xendo将近 8 年前

ezulich将近 8 年前

natekupp将近 8 年前

评论 #14724884 未加载

jsemrau将近 8 年前

评论 #14723744 未加载

评论 #14723702 未加载

评论 #14723200 未加载

avitzurel将近 8 年前

nikanj将近 8 年前

评论 #14724087 未加载

评论 #14723340 未加载

评论 #14723433 未加载

dserban将近 8 年前

评论 #14725454 未加载

kevan将近 8 年前

评论 #14723684 未加载

dyeje将近 8 年前

hox将近 8 年前

评论 #14723932 未加载

评论 #14723472 未加载

cyberferret将近 8 年前

jjirsa将近 8 年前

macinjosh将近 8 年前

jey将近 8 年前

pishpash将近 8 年前

评论 #14725757 未加载

didibus将近 8 年前

评论 #14724046 未加载

banku_brougham将近 8 年前

The first part of the article sounds convincing, but i'm pretty sure amazon is using DynamoDB extensively and successfully at massive scales.

评论 #14723728 未加载

imauld将近 8 年前

luhn将近 8 年前

评论 #14725403 未加载

datashovel将近 8 年前

rushi_agrawal将近 8 年前

If DynamoDB guys can come up with a mechanism similar to their CPU credits concept, that'd be a really nice feature to have. Of course it can't be as straightforward as CPU credits.

znep将近 8 年前

Does anyone have any real world experience with how the Amazon DynamoDB Accelerator that AWS released as a preview a few months ago can help out?

iampims将近 8 年前

rather poor example of "how dynamodb doesn't scale".DynamoDB is definitely not the silver bullet people hope it is but does work exceptionally well for what it was designed for.