For reference, this beat GPT-3.5 which scores 47%, but not GPT-4 which scored a massive 67%.<p>Beating out GPT-3.5 at <i>any</i> task with such a small model is very cool to me.<p>How much longer until these dumb virtual assistants (Siri, Google, Alexa) get replaced with on-device LLMs? We’ve gotta be getting close. These small, optimized models are catching up quickly in so many domains.