TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How discord stores billions of messages (2017)

204 pointsby greymalikover 2 years ago

12 comments

jpalomakiover 2 years ago
This is quite valuable advise: &quot;The original version of Discord was built in just under two months in early 2015. Arguably, one of the best databases for iterating quickly is MongoDB. Everything on Discord was stored in a single MongoDB replica set and this was intentional, but we also planned everything for easy migration to a new database&quot;<p>Also the article links to Twitter blog, which gives similar point (it&#x27;s from 2010): &quot;We [Twitter] currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters&quot; [1]<p>[1] <a href="https:&#x2F;&#x2F;blog.twitter.com&#x2F;engineering&#x2F;en_us&#x2F;a&#x2F;2010&#x2F;announcing-snowflake" rel="nofollow">https:&#x2F;&#x2F;blog.twitter.com&#x2F;engineering&#x2F;en_us&#x2F;a&#x2F;2010&#x2F;announcing...</a>
评论 #32607387 未加载
评论 #32616685 未加载
chrsigover 2 years ago
&gt; Nothing was surprising, we got exactly what we expected.<p>Such a satisfying feeling in the engineering world.<p>&gt; We noticed Cassandra was running 10 second “stop-the-world” GC constantly but we had no idea why.<p>This makes me very thankful for the work that the Go team has put into the go GC.<p>&gt; In the scenario that a user edits a message at the same time as another user deletes the same message, we ended up with a row that was missing all the data except the primary key and the text since all Cassandra writes are upserts.<p>Does cassandra not offer a mechanism to do a conditional update? I&#x27;d expect to be able to submit a upsert that fails if the row isn&#x27;t present, or has a `deleted = true` field, or something to that effect.
评论 #32610347 未加载
评论 #32609051 未加载
评论 #32610425 未加载
judge2020over 2 years ago
(2017)<p>As revealed in this blog post[0], they now see 4 billion messages per day.<p>0: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32474093" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32474093</a>
评论 #32607052 未加载
评论 #32607995 未加载
raffraffraffover 2 years ago
I haven&#x27;t used Cassandra for about 3 years, and this is awakening memories. At a previous company I inherited a very badly assembled cluster that was used for a time series database. The guys who built it said (and no, I&#x27;m not kidding...) &quot;we don&#x27;t need to put a TTL on metrics because they&#x27;re tiny and anyway we can just add more nodes and scale the cluster horizontally forever!&quot;. Well, forever was about 2 years, when the physical data center ran out of rack space, and teams abused the metrics system with TBs of bullshit data. That was when they handed the whole metrics system to yours truly. And I discover two things:<p>1. You can&#x27;t just bulk delete a year of old stale data without breaking it<p>2. &quot;Woops, did we really set replication factor to 1?&quot;<p>Fun.
评论 #32611998 未加载
oreallyover 2 years ago
I don&#x27;t know databases and there&#x27;s quite a number of them on the market, so posts like these are great.<p>Sidetrack though, does anyone have a list for pros and cons of each db, with a preference towards low latency? Also how does it compare with say Postgres?
评论 #32607178 未加载
评论 #32610012 未加载
coldbluesover 2 years ago
Unfortunately, they&#x27;re not doing a good job at deleting them. If you press the &quot;Delete Account&quot; button, all it does is anonymize your profile, and leaves all of your messages intact. One of the reasons I avoid using Discord whenever possible.
评论 #32607290 未加载
评论 #32616173 未加载
paxysover 2 years ago
As a point of comparison, Slack uses MySQL (Vitess) – <a href="https:&#x2F;&#x2F;slack.engineering&#x2F;scaling-datastores-at-slack-with-vitess&#x2F;" rel="nofollow">https:&#x2F;&#x2F;slack.engineering&#x2F;scaling-datastores-at-slack-with-v...</a>
Jamie9912over 2 years ago
I believe they use ScyllaDB exclusively now for storing messages
mannyvover 2 years ago
&#x27;build quickly to prove out a product feature, but always with a path to a more robust solution&quot;<p>Yes
评论 #32613844 未加载
unlogover 2 years ago
Discord doesn&#x27;t respect privacy, you cannot just get rid of a whole conversation. Users are the product, and they make it so difficult to delete entire convos that it&#x27;s so obvious it&#x27;s just valuable to them.
评论 #32607729 未加载
评论 #32607807 未加载
评论 #32608561 未加载
hestefiskover 2 years ago
Wonder if Postgres would scale to such volume.
评论 #32611043 未加载
seydorover 2 years ago
how many are bots? AFAIK bot traffic is higher than human there. Plus they added so much bot APIs that now bots are hacking and spamming users&#x27; accounts
评论 #32606995 未加载
评论 #32611102 未加载