According to @dang (<a href="https://news.ycombinator.com/item?id=28479595" rel="nofollow">https://news.ycombinator.com/item?id=28479595</a>) via @sctb (<a href="https://news.ycombinator.com/item?id=16076041" rel="nofollow">https://news.ycombinator.com/item?id=16076041</a>)<p><pre><code> We’re recently running two machines (master and standby) at M5 Hosting. All of HN runs on a single box, nothing exotic:
CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (3500.07-MHz K8-class CPU)
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
Mirrored SSDs for data, mirrored magnetic for logs (UFS)</code></pre>
A single bare metal server is more reliable than most people think it is. Complexity adds a lot of overhead and layer after layer that could possibly fail.
We managed to run a successful bootcamp school LMS on a single cheapest 1gb ram hetzner vps with rails app, redis cache, postgres, backups, staging env and near zero production issues.<p>Not hi loaded of course, but still hundreds active users every single day interacting with the app.<p>Recently had to upgrade to the next tier because of growth.<p>Modern servers are super fast and reliable as long as you know what you’re doing and don’t waste it on unnecessary overheads like k8s etc.
I guess because of sanity and simplicity in its architecture?<p>I once wrote a software in Rust, a simple API, one binary in one DigitalOcean instance started by systemd, and nothing else. The things has been working nonstop for years making it the most stable piece of long running software I’ve ever written, and I think it all comes from it being simple without any extra/unnecessary complexity added.<p>I’m not bragging btw, I actually had to contact the user years after I wrote that because I couldn’t believe that the thing was still working but I hadn’t heard from them in years!
It IS down occasionally: <a href="https://twitter.com/hnstatus" rel="nofollow">https://twitter.com/hnstatus</a><p>Not all "downs" are reflected there. Last time I remember having really bad perf or non-workable, don't remember - but opening in incognito, you would get cached results fast.
HN is a mature product! Most common software failures are from a deployment of changes, I don't think HN is being deployed to every day<p>We're all used to effectively beta software that's constantly being updated every day and never final
I would love to know tech stack and architecture of HN. And how it evolved (if it did). And what resources they spend (money, manhours etc) to maintain it.
Last time I remember it being down for anything other than a few minutes was back in 2014.
<a href="https://twitter.com/Coding2Learn/status/420298797462593536" rel="nofollow">https://twitter.com/Coding2Learn/status/420298797462593536</a>
I have seen an error message once, instead of content. I did not remember it but the wording was amazingly perfect.
And there is one minor issue when "reply" button to a specific comment had not been loaded by the time the comment has rendered.
<i>> always online?</i><p>It depends on what you mean by <i>"online"</i> and the service level :<p>- "online" meaning a HN server responds with something. In this more literal sense, HN always seems to be up.<p>- "online" meaning <i>normal</i> page load response times. In this sense, HN sometimes times out with <i>"sorry we can't serve your request right now"</i>. That seems to happen once a week or once a month. Another example is a super popular thread (e.g. "Trump wins election") that hammers the server and threads take a minute or more to load. This prompts dang to write a comment in the thread asking people to "log out" if they're not posting. This reduces load on the server as rendering pages of signed-out users don't need to query individual user stats, voting counts, hidden comments, etc. This would be a form of adhoc community behavior cooperating to reduce workload on the server rather than spin up extra instances on AWS.<p>The occasional disruptions to the 2nd meaning of "online" is ok since this is a discussion forum and nobody's revenue is dependent on it. Therefore, it doesn't "need" more uptime than it already has.
Guess they don’t use Cloudflare:<p><a href="https://news.ycombinator.com/item?id=31820635" rel="nofollow">https://news.ycombinator.com/item?id=31820635</a>