TechEcho

10 comments

Quite beefy hardware for on-prem. Perhaps someone could explain to me why 30k users, even assuming concurrent users would be an issue for hardware that size?Is the app stack naturally resource heavy or is this setup particular different to how a instance should be?

评论 #33855872 未加载

评论 #33855744 未加载

评论 #33855546 未加载

评论 #33855611 未加载

rubiquityover 2 years ago

Not sure why Ruby on Rails is taking a beating in the comments section here. The problem is clearly the 1Gbps network that is functioning at only 200Mbps and worn out/defective SSDs. Waiting around on IO all day will bring any stack to a crawl.

评论 #33855895 未加载

yozover 2 years ago

This is a superb write-up of an intense, exhausting situation. Great mixture of low-level detail and tactics, and high-level thinking about systems and people. Congratulations on managing that migration, and thank you for sharing this with us!

dborehamover 2 years ago

Confused as to why they didn't just replace the bad SSDs with good ones?Fwiw this sounds to me like what happens when you use "retail" SSDs (drives marketed for use in user laptops) underneath a high write traffic application such as a relational database. Often such drives will either wear out or will turn out to have pathological performance characteristics (they do something akin to GC eventually), or they just have firmware bugs. Use enterprise rated drives for an application like this.

评论 #33856184 未加载

评论 #33856114 未加载

评论 #33856212 未加载

评论 #33855955 未加载

limaover 2 years ago

Hetzner is great, but it may not be the best choice for a social network that hosts user content and may attract controversy.As a mass-market hosting provider, Hetzner is subject to constant fraud, abuse and hacked customer servers, and in consequence, their abuse department is very trigger-happy and will usually shoot first and ask questions later. They can and will kick out customers that cause too much of a headache, regardless of their ToS.Their outbound DDoS detection systems are very sensitive and prone to false positives, such as when you get attacked yourself and the TCP backscatter is considered a portscan. If the system is sufficiently confident that you are doing something bad, it automatically firewalls off your servers until you explain yourself.Likewise, inbound abuse reports sometimes lead to automated or manual blocks before you can respond to them.They also rate limited or blocked entire port ranges in the past to get rid of Chia miners and similar miscreants with no regards to collateral damage to other services and without informing their other customers.Their pricing is good and service is otherwise excellent, and if you do get locked out, you can talk to actual humans to sort it out. But, only after the damage is already done. If you use them, have a backup plan.

评论 #33856685 未加载

asimover 2 years ago

As someone who scaled ruby on rails in the prime era 2007-2009 I'll tell you the problems have not changed. It's very straightforward horizontal scaling followed by load balancing across multiple nodes. Load relates having enough cores, fast enough disks and enough egress bandwidth throughput. Everything else is purely caching in front of a poorly performing ruby web server and minimising disk or database reads.The write up is cool. Reminiscent of things we used to do back in that early rails 2-3 era. Just funny we're back where we started.TLDR: if you want to run ruby on rails on bare metal be ready to run something with 8+ cores, 10k rpm disks minimum and more bandwidth than you can support out of your basement.

评论 #33855960 未加载

neonsunsetover 2 years ago

Weak technology stack and deeply flawed concept of federation that enables local centralization of control by discord-mods-meme style with all the corresponding issues.Mastodon should have been based on DHT with each "terminal" aka "profile" having much higher autonomy.Otherwise, it just gives more tools to people who left Twitter to continue doing same societal damage.p.s.: it is time to stop writing back-ends in Ruby when every other popular alternative (sans Python-based ones) is more powerful and scalable.

评论 #33855809 未加载

评论 #33855728 未加载

评论 #33855747 未加载

评论 #33855714 未加载

cyberphobeover 2 years ago

These comments make me want to log off for a bit.Post: we hit scaling issues caused by our failing disks and running image hosting and databases over NFSHN: It’s obviously Ruby on Rails fault

评论 #33856232 未加载

评论 #33856617 未加载

musk_micropenisover 2 years ago

30,000 users seems like a ludicrously small number of users to hit scaling problems. It sounds like Mastadon has not been designed for scale from the ground up, which is surprising for a project that hopes to be a popular social network.

评论 #33856096 未加载

评论 #33855873 未加载

评论 #33856132 未加载

评论 #33874669 未加载

评论 #33855756 未加载

评论 #33855717 未加载

评论 #33855837 未加载

评论 #33855705 未加载

评论 #33855780 未加载

评论 #33856040 未加载

valegover 2 years ago

We have come full circle. I remember seeing lots of Twitter's fail whale when it was run on RoR.

10 comments

CommanderDataover 2 years ago

评论 #33855872 未加载

评论 #33855744 未加载

评论 #33855546 未加载

评论 #33855611 未加载

rubiquityover 2 years ago

评论 #33855895 未加载

yozover 2 years ago

dborehamover 2 years ago

评论 #33856184 未加载

评论 #33856114 未加载

评论 #33856212 未加载

评论 #33855955 未加载

limaover 2 years ago

评论 #33856685 未加载

asimover 2 years ago

评论 #33855960 未加载

neonsunsetover 2 years ago

评论 #33855809 未加载

评论 #33855728 未加载

评论 #33855747 未加载

评论 #33855714 未加载

cyberphobeover 2 years ago

These comments make me want to log off for a bit.Post: we hit scaling issues caused by our failing disks and running image hosting and databases over NFSHN: It’s obviously Ruby on Rails fault

评论 #33856232 未加载

评论 #33856617 未加载

musk_micropenisover 2 years ago

评论 #33856096 未加载

评论 #33855873 未加载

评论 #33856132 未加载

评论 #33874669 未加载

评论 #33855756 未加载

评论 #33855717 未加载

评论 #33855837 未加载

评论 #33855705 未加载

评论 #33855780 未加载

评论 #33856040 未加载

valegover 2 years ago

We have come full circle. I remember seeing lots of Twitter's fail whale when it was run on RoR.

Post mortem on Mastodon outage with 30k users

10 comments

Post mortem on Mastodon outage with 30k users

10 comments