Recent and related:<p><i>Post mortem on Mastodon outage with 30k users</i> - <a href="https://news.ycombinator.com/item?id=33855250" rel="nofollow">https://news.ycombinator.com/item?id=33855250</a> - Dec 2022 (101 comments)<p>(Offtopic meta note: Alert users will note that that thread was posted later than this one. This is because the second-chance process (<a href="https://news.ycombinator.com/item?id=26998308" rel="nofollow">https://news.ycombinator.com/item?id=26998308</a>) has a race condition: the events "story makes front page" and "moderator puts story in second-chance pool" sometimes diverge and can happen in any order.)
Ah yes.<p>I've also seen NFS/ZFS on Linux have very... bizzare... issues with locking, latency, and poor handling of errors bubbled up from the block layer taking down clients or even the host.<p>All of these went away when we redeployed everything into a Solaris-based distro (still exporting ZFS shares to Linux clients via NFS). It does seem something specific to the interaction of these two components under load on a Linux kernel.<p>Unfortunately, it also only happens under real-world production load and was impossible to create reliable test-case with simulated stress tests or benchmarking :(
As someone running a commercial provider for Mastodon (and Matrix, and XMPP...), I am somewhat envious of these posts. "Wow, 30000 users! If I had that many users on my service paying the $0.50/month I am charging, it would be enough to pay myself a full salary!".<p>But then I realize that they are only getting these many people because they are not driven by commercial interests: even with donations, I can bet they are not collecting enough to keep things afloat and they only keep going because they don't mind spending all this time, money and resources of their own on this project. They can treat it as a (relatively expensive) hobby, and they can keep it running as long as it satisfies them.<p>The problem is that I think that this is harmful in the long run. Yes, people now are finally seeing the issue with ad-funded social media. But if we want to have a healthy alternative, we need to understand TANSTAAFL, we need to accept that we need to give real money to the people working on this and to have the servers available 24/7 to store and distribute the hot takes and stupid memes that we so bizarrely crave every day.<p>I worry that if we don't change the mindset quickly, the whole Twitter drama would be a wasted opportunity and Mastodon (and the Fediverse in general) will go back to the status quo, where surveillance capitalism is the norm and truly open systems are just a geeky curiosity.<p>I wish I could fund a tech-equivalent of the "buy local and organic" campaign. I wish I had more people thinking "ok, I will pay $5/month to this guy and I will bring 10 people to this instance" because it is the <i>ethical thing to do</i>.
Kris described hachyderm's infrastructure operating in her basement on the Oxide and Friends podcast in mid-November. Kudos to her for being able to keep it going there so long!
> "We can then leverage Mastodon’s S3 feature to write the “hot” data directly back to Digital Ocean using a reverse Nginx proxy."<p>How does that work?
"In other words, every ugly system is also a successful system. Every beautiful system, has never seen spontaneous adoption." - not only is this logically fallacious, its pretty offensive about the general notion of software quality
I would like to understand why Mastodon requires such a huge amount of hardware for mediocre traffic volumes. Not just the lazy "it's Rails" answer - I know Rails is a resource hog, but that doesn't go far enough to explain the extreme requirements here.<p>As a point of reference, look at what Stack Overflow is run on. As a caveat, SO is probably more read-heavy than Mastodon, but it also serves several orders of magnitude more volume (on a normal day in 2016 they would serve 209,420,973 HTTP requests[0]). They did this on 4 DB servers and 11 web servers. And in fact, it can (and has) worked serving this volume of traffic on only a single server.<p>With this setup SO was not even close to maxing out their hardware (servers were under 10% load, approximately). SO also listed their server hardware[1] in 2016. I don't know enough about server hardware to assess the difference, but to my eye they look similar on the web tier with similar amounts of memory, similar disk, etc.<p>I'm not saying Hachyderm is doing anything wrong, but it makes me wonder if there's a fundamental problem with the design of Mastodon. And to be clear I understand that this particular issue was caused by a disk failure, but that they even had this hardware in place running Hachyderm is surprising to me.<p>[0] <a href="https://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/" rel="nofollow">https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...</a><p>[1] <a href="https://nickcraver.com/blog/2016/03/29/stack-overflow-the-hardware-2016-edition/" rel="nofollow">https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...</a>
That basement hardware didn't last long. If you don't know how big your userbase is going to be it would be better to avoid committing money to specific hardware.