This is quite valuable advise: "The original version of Discord was built in just under two months in early 2015. Arguably, one of the best databases for iterating quickly is MongoDB. Everything on Discord was stored in a single MongoDB replica set and this was intentional, but we also planned everything for easy migration to a new database"<p>Also the article links to Twitter blog, which gives similar point (it's from 2010): "We [Twitter] currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters" [1]<p>[1] <a href="https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake" rel="nofollow">https://blog.twitter.com/engineering/en_us/a/2010/announcing...</a>
> Nothing was surprising, we got exactly what we expected.<p>Such a satisfying feeling in the engineering world.<p>> We noticed Cassandra was running 10 second “stop-the-world” GC constantly but we had no idea why.<p>This makes me very thankful for the work that the Go team has put into the go GC.<p>> In the scenario that a user edits a message at the same time as another user deletes the same message, we ended up with a row that was missing all the data except the primary key and the text since all Cassandra writes are upserts.<p>Does cassandra not offer a mechanism to do a conditional update? I'd expect to be able to submit a upsert that fails if the row isn't present, or has a `deleted = true` field, or something to that effect.
(2017)<p>As revealed in this blog post[0], they now see 4 billion messages per day.<p>0: <a href="https://news.ycombinator.com/item?id=32474093" rel="nofollow">https://news.ycombinator.com/item?id=32474093</a>
I haven't used Cassandra for about 3 years, and this is awakening memories. At a previous company I inherited a very badly assembled cluster that was used for a time series database. The guys who built it said (and no, I'm not kidding...) "we don't need to put a TTL on metrics because they're tiny and anyway we can just add more nodes and scale the cluster horizontally forever!". Well, forever was about 2 years, when the physical data center ran out of rack space, and teams abused the metrics system with TBs of bullshit data. That was when they handed the whole metrics system to yours truly. And I discover two things:<p>1. You can't just bulk delete a year of old stale data without breaking it<p>2. "Woops, did we really set replication factor to 1?"<p>Fun.
I don't know databases and there's quite a number of them on the market, so posts like these are great.<p>Sidetrack though, does anyone have a list for pros and cons of each db, with a preference towards low latency? Also how does it compare with say Postgres?
Unfortunately, they're not doing a good job at deleting them. If you press the "Delete Account" button, all it does is anonymize your profile, and leaves all of your messages intact. One of the reasons I avoid using Discord whenever possible.
As a point of comparison, Slack uses MySQL (Vitess) – <a href="https://slack.engineering/scaling-datastores-at-slack-with-vitess/" rel="nofollow">https://slack.engineering/scaling-datastores-at-slack-with-v...</a>
Discord doesn't respect privacy, you cannot just get rid of a whole conversation. Users are the product, and they make it so difficult to delete entire convos that it's so obvious it's just valuable to them.
how many are bots? AFAIK bot traffic is higher than human there. Plus they added so much bot APIs that now bots are hacking and spamming users' accounts