Lichess is one of those things you just have to sit and appreciate like a fine wine. It's absolutely wonderful for people in the chess community. I use it every day and am inspired by the functionality and performance, especially knowing it's a 1-2 person shop with limited budget.
> here are the empirical distribution functions (ECDFs) with 30ms added to each response time<p>> The added constant seems artificial, but it's just viewing the results from the point of view of a client with 30ms ping time. Otherwise the log scaled x-axis would overemphasize the importance of a few milliseconds at the low end.<p>I thought this was interesting - maybe it's a standard practice I was just unaware of but it seems like a smart trick.
Did they have to reduce cost or is there any other reason to not stick 20TB of SSDs in a box and call it a day? 4TB SSDs only cost ~$300, even HP or Dell SFF drives aren't much more expensive.<p>I guess they were interested in doing the testing and optimization for fun. From a product standpoint I probably would have invested my limited time in other projects.
Some questionable choices are made in this optimization.<p>The reason for the optimization is that there is so much IO activity the RAID checks can't complete.<p>It is unclear from the article if the RAID checks were ever completed on 17TiB of data. Instead, they choose to disable the periodic RAID checks and instead switch to doing the error checking as a page of data is read in. The two are not equivalent, and both should be used for important data.<p>Finding corrupt data only as you try to read it can lead to long running data corruptions, maybe to the point your backups do not go back far enough to restore the uncorrupted data. Underpinning this also is a change to RAID 0... While the fastest option, they are putting a lot of faith in that NVMe config handling that kind of workload.<p>Hope they have good backups...<p>EDIT: A good way to solve this is to spin up a temporary server, restore your backups to it, do the full data checks and when successful, you have also checked your backup and restore process along with the integrity of the file.
You still want to have enough overhead available to complete the RAID checks on the primary server and don't use RAID 0 for performance.
There is also lishogi but it is smaller enough to not require such optimizations yet.<p>Shogi is the most entertaining for chess variants. Xiangqi not as much.
I know it's not a fair comparison but I'm truly impressed by the quality of engineering shown by the Lichess team, when their main competitor was for example boasting about a migration to GCP and yet suffering from repeated outages due to fairly organic growth in popularity. While I believe they employ 100x more people.<p>Lichess' mobile app was a weak spot, however the v2 rewrite in Flutter is already pretty good while still in beta.<p>And keep in mind Thibault pays himself less than 60k/year.