TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Mixtral 8x7B Above Gemini Pro – Chatbot Arena Leaderboard Updated

2 pointsby jafitcover 1 year ago

1 comment

jafitcover 1 year ago
This is based on users choosing the better from 2 models at a time, and calculating an ELO rating from who-beats-who.<p>BYOT - bring your own tests style.<p>Gives a better picture of real-world performance and more robust against contamination.<p>They collected over 6000 and 1500 votes for Mixtral-8x7B and Gemini Pro.<p>While ELO ratings are widely used to rank performance in Chess or among sports teams, here&#x27;s a disclaimer by the makers of the leaderboard:<p>---<p>&gt; Please note Arena is a &quot;live eval&quot; and pretty much a sampling process to estimate models capability.<p>&gt; That&#x27;s why we show the confidence intervals through bootstrapping. Statistically, these models (e.g., GPT-3.5, Mixtral, Gemini Pro) are very close and only looking at their ranking can be misleading.<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;lmsysorg&#x2F;status&#x2F;1735729398672716114" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;lmsysorg&#x2F;status&#x2F;1735729398672716114</a><p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;lmsysorg&#x2F;status&#x2F;1735751052287226059" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;lmsysorg&#x2F;status&#x2F;1735751052287226059</a>