TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LLM leaderboard focusing on assessing their biases

29 pointsby softmodelingabout 1 year ago

6 comments

softmodelingabout 1 year ago
For additional context:<p>- Some more details on the building (and challenges) of the leaderboard <a href="https:&#x2F;&#x2F;livablesoftware.com&#x2F;biases-llm-leaderboard&#x2F;" rel="nofollow">https:&#x2F;&#x2F;livablesoftware.com&#x2F;biases-llm-leaderboard&#x2F;</a><p>- The tests used in the backend: <a href="https:&#x2F;&#x2F;github.com&#x2F;SOM-Research&#x2F;LangBiTe">https:&#x2F;&#x2F;github.com&#x2F;SOM-Research&#x2F;LangBiTe</a>
shikon7about 1 year ago
Rather than assessing whether the LLM has biases, the leaderboard seems to assess whether the LLM affirms the tester’s biases.<p>Not that I blame them, as it’s probably impossible to define what a “no bias” exactly means.
Terrettaabout 1 year ago
Gene Roddenberry would like to have a word about the bias of the testers.
评论 #39804724 未加载
rastignackabout 1 year ago
Here are the (heavily biased and dishonest) prompts:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;SOM-Research&#x2F;LangBiTe&#x2F;blob&#x2F;main&#x2F;langbite&#x2F;resources&#x2F;prompts.csv">https:&#x2F;&#x2F;github.com&#x2F;SOM-Research&#x2F;LangBiTe&#x2F;blob&#x2F;main&#x2F;langbite&#x2F;...</a>
评论 #39806179 未加载
评论 #39805263 未加载
评论 #39806876 未加载
评论 #39805081 未加载
评论 #39805731 未加载
评论 #39805212 未加载
salesynerdabout 1 year ago
GPT-4 seems to be the least biased of all the LLMs. As a newbie to the field, does it mean that OpenAI have the most &quot;balanced&quot; data and&#x2F;or does it do a great job in training their model? If the training is the secret sause of success, will it make sense for these companies to share their &quot;best&quot; data with each other?
评论 #39805327 未加载
评论 #39806861 未加载
djohnstonabout 1 year ago
Lazy, derivative, failing to account for any nuance and falling back to the same tired leftist talking points. This eval set could better be called “Am I the little parrot my master wants me to be?”<p>The best LLMs will be the ones that don’t conform to this canned drivel, so presumably the bottom of the leaderboard is where to look. Thanks!
评论 #39809143 未加载