TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Post mortem on Mastodon outage with 30k users

84 点作者 kris-nova超过 2 年前

10 条评论

CommanderData超过 2 年前
Quite beefy hardware for on-prem. Perhaps someone could explain to me why 30k users, even assuming concurrent users would be an issue for hardware that size?<p>Is the app stack naturally resource heavy or is this setup particular different to how a instance should be?
评论 #33855872 未加载
评论 #33855744 未加载
评论 #33855546 未加载
评论 #33855611 未加载
rubiquity超过 2 年前
Not sure why Ruby on Rails is taking a beating in the comments section here. The problem is clearly the 1Gbps network that is functioning at only 200Mbps and worn out&#x2F;defective SSDs. Waiting around on IO all day will bring any stack to a crawl.
评论 #33855895 未加载
yoz超过 2 年前
This is a superb write-up of an intense, exhausting situation. Great mixture of low-level detail and tactics, and high-level thinking about systems and people. Congratulations on managing that migration, and thank you for sharing this with us!
dboreham超过 2 年前
Confused as to why they didn&#x27;t just replace the bad SSDs with good ones?<p>Fwiw this sounds to me like what happens when you use &quot;retail&quot; SSDs (drives marketed for use in user laptops) underneath a high write traffic application such as a relational database. Often such drives will either wear out or will turn out to have pathological performance characteristics (they do something akin to GC eventually), or they just have firmware bugs. Use enterprise rated drives for an application like this.
评论 #33856184 未加载
评论 #33856114 未加载
评论 #33856212 未加载
评论 #33855955 未加载
lima超过 2 年前
Hetzner is great, but it may not be the best choice for a social network that hosts user content and may attract controversy.<p>As a mass-market hosting provider, Hetzner is subject to constant fraud, abuse and hacked customer servers, and in consequence, their abuse department is very trigger-happy and will usually shoot first and ask questions later. They can and will kick out customers that cause too much of a headache, regardless of their ToS.<p>Their outbound DDoS detection systems are very sensitive and prone to false positives, such as when you get attacked yourself and the TCP backscatter is considered a portscan. If the system is sufficiently confident that you are doing something bad, it automatically firewalls off your servers until you explain yourself.<p>Likewise, inbound abuse reports sometimes lead to automated or manual blocks before you can respond to them.<p>They also rate limited or blocked entire port ranges in the past to get rid of Chia miners and similar miscreants with no regards to collateral damage to other services and without informing their other customers.<p>Their pricing is good and service is otherwise excellent, and if you do get locked out, you can talk to actual humans to sort it out. But, only after the damage is already done. If you use them, have a backup plan.
评论 #33856685 未加载
asim超过 2 年前
As someone who scaled ruby on rails in the prime era 2007-2009 I&#x27;ll tell you the problems have not changed. It&#x27;s very straightforward horizontal scaling followed by load balancing across multiple nodes. Load relates having enough cores, fast enough disks and enough egress bandwidth throughput. Everything else is purely caching in front of a poorly performing ruby web server and minimising disk or database reads.<p>The write up is cool. Reminiscent of things we used to do back in that early rails 2-3 era. Just funny we&#x27;re back where we started.<p>TLDR: if you want to run ruby on rails on bare metal be ready to run something with 8+ cores, 10k rpm disks minimum and more bandwidth than you can support out of your basement.
评论 #33855960 未加载
neonsunset超过 2 年前
Weak technology stack and deeply flawed concept of federation that enables local centralization of control by discord-mods-meme style with all the corresponding issues.<p>Mastodon should have been based on DHT with each &quot;terminal&quot; aka &quot;profile&quot; having much higher autonomy.<p>Otherwise, it just gives more tools to people who left Twitter to continue doing same societal damage.<p>p.s.: it is time to stop writing back-ends in Ruby when every other popular alternative (sans Python-based ones) is more powerful and scalable.
评论 #33855809 未加载
评论 #33855728 未加载
评论 #33855747 未加载
评论 #33855714 未加载
cyberphobe超过 2 年前
These comments make me want to log off for a bit.<p>Post: we hit scaling issues caused by our failing disks and running image hosting and databases over NFS<p>HN: It’s obviously Ruby on Rails fault
评论 #33856232 未加载
评论 #33856617 未加载
musk_micropenis超过 2 年前
30,000 users seems like a ludicrously small number of users to hit scaling problems. It sounds like Mastadon has not been designed for scale from the ground up, which is surprising for a project that hopes to be a popular social network.
评论 #33856096 未加载
评论 #33855873 未加载
评论 #33856132 未加载
评论 #33874669 未加载
评论 #33855756 未加载
评论 #33855717 未加载
评论 #33855837 未加载
评论 #33855705 未加载
评论 #33855780 未加载
评论 #33856040 未加载
valeg超过 2 年前
We have come full circle. I remember seeing lots of Twitter&#x27;s fail whale when it was run on RoR.