Hey all,<p>We are happy to release a new benchmark for computer use. We didn’t set out to build a benchmark but found the current state of OSWorld to be very challenging to work with and numerous tests were faulty.<p>OSUniverse aims to be dead simple to use, it only requires docker and can run in a single command. It offers test levels that increase in complexity and are easy to extend.<p>We have benchmarked all the top agents. As new GUI agents are released we will continue to update their performance.<p>Enjoy!