I haven't read the whole piece yet, but I saw some things that already stood out as strange.<p>> Please keep reading to see how diligent we were in creating a fair test case<p>I hope this is true, but my cursory reading found several unfair spots that seem at odds with this statement.<p>> 3-node cluster on single DC | RF=3<p>DynamoDB works across AZs and will continue to be available even when a data center goes down. That cross-AZ operation has benefits as well as latency costs that are not present in the ScyllaDB setup.<p>> We hit errors on ~50% of the YCSB threads causing them to die when using ≥50% of write provisioned capacity<p>That is surprising. It is quite common to use your full provisioned capacity without problems. My guess is that there is something not ideal about the YCSB DynamoDB library or its configuration. I'm not familiar with YCSB: does it give you stack traces that indicate why the threads failed?<p>> Sadly for DynamoDB, each item weighted 1.1kb – YCSB default schema, thus each write originated in two accesses<p>This is what made me come write this comment. You specifically knew this was a pessimistic case which could be easily addressed to allow DynamoDB to operate at a lower cost. Is arbitrarily settling for the default 1.1kB item size on an artificial benchmark fair? Good engineering teams use their tools the way that gives them the most benefit. Calling out that you may have to work to ensure your use case doesn't have pessimistic characteristics would clearly be fair, but I'm not convinced that just picking arbitrary benchmark settings is.
This all looks great but one of the primary advantages of DynamoDB is that you don't have to spend resources on maintaining or troubleshooting ec2 instances. When you have a large deployment, not having to worry about HA on your data store is hard to beat.
A theme of the comments so far has been around the fact that the benchmark is created in a manner that feels biased.<p>QUICK STRAW POLL: What is the right way for that information to be gathered and presented?<p>a. There is no bias. Companies can and should present research that may favor them, as long as they cite good sources.<p>b. An industry analyst, paid to research multiple companies and options and present that information to their respective (most likely paying) customers.<p>c. A journalist, presenting research in public, funded by advertising for unrelated interests.<p>d. Reporting by peers/actual users of the system on how the products compare, and possibly about how it aligns with their technical and business goals.<p>e. Trust no one. Conduct your own research, and publish it if you see fit to so do, or keep it to yourself.<p>I see potential "disinformation vulnerability" with each approach. How ought we get the best info on how to align our organizations with technologies?