This is a really valuable contribution, just to benchmark whatever you are doing vs what's possible, in a similar way that the large datacenter operators have shown everyone that PUE of 1.1 is achievable. This data shows that you can achieve > 60% utilization of both compressible and incompressible resources, overcommit both kinds of resources by 200%, while scheduling task arrivals at over 100 per second. It really is an extremely valuable glimpse into how the bigs operate.
Was involved in this research project for a few months in early 2019. Feel free to post some questions that are not sensitive to internal details, and I can answer them here.<p>Edit: not one of the coauthors, since I left the team in April 2019.
The resource usage section at the end was really interesting, and surprising to me. 1% of jobs use 99% of resources! It would be interesting to try and understand how this pattern came about and if there's particular engineering decisions that tend to lead to this situation where you have a handful of incredibly resource intensive jobs and loads of very lightweight jobs.
Also a name clash with Borg the backup utility (which is the #5 item returned from a google search for "borg" for me when I just tested what would come back):<p><a href="https://borgbackup.readthedocs.io/en/stable/" rel="nofollow">https://borgbackup.readthedocs.io/en/stable/</a>