TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Lichess: Post-Mortem of Our Longest Downtime

176 pointsby jpablo8 months ago

5 comments

carlsborg8 months ago
The main lichess engine (lila, open source) is a single monolith program that&#x27;s deployed on a single server. It serves ~5 million games per day. But there are a several other pieces too. They discuss the architecture here <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=crKNBSpO2_I" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=crKNBSpO2_I</a><p>BTW consider donating if you use lichess.
评论 #41588906 未加载
评论 #41589922 未加载
评论 #41593114 未加载
评论 #41604932 未加载
theideaofcoffee8 months ago
I guess some of my questions are addressed in the latter half of the post, but I&#x27;m still puzzled why a prominent service didn&#x27;t have a plan for what looked like a run of the mill hardware outage. It&#x27;s hard to know exactly what happened as I&#x27;m having trouble parsing some of the post (what is a &#x27;network connector&#x27;? is it a cable? nic?). What were some of the &#x27;increasingly outlandish&#x27; workarounds? Are they actually standing up production hosts manually, and was that the cause of a delay or unwillingness to get new hardware goin? I think it would be important to have all of that set down either in documentation or code seeing as most of their technical staff are either volunteers, who may come and go, or part timers. Maybe they did, it&#x27;s not clear.<p>It&#x27;s also weird seeing that they are still waiting on their provider to tell them exactly what was done to the hardware to get it going again, that&#x27;s usually one of the first things a tech mentions: &quot;ok, we replaced the optics in port 1&quot; or &quot;I replaced that cable after seeing increased error rates&quot;, something like that.
评论 #41597966 未加载
holsta8 months ago
This response and post-mortem is superior to most commercial services I have seen in recent years.
评论 #41591941 未加载
评论 #41591295 未加载
评论 #41592159 未加载
评论 #41593180 未加载
ctippett8 months ago
Once the private link was reestablished, could they not have tunneled out to the internet via another server acting as a sort of gateway?<p>Disclaimer: I&#x27;m not a network engineer so I may be misunderstanding the practicality and complexity of such a workaround.
lazyant8 months ago
summary for the lazy: OVH