TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

We require new LLM evaluation methods

2 pointsby digitcatphdover 1 year ago
Most LLM evaluation methods are based on a defined set of criteria that many can be optimized for, without necessarily meaning better performance. Much like a student who performs well at memorizing exam questions, but performs poorly &#x27;in the field&#x27;. This issue has even been debated in so called &#x27;standardized tests&#x27;.<p>Representing a more complex landscape with domain and task specific agents, we should rather develop a set of tests on &#x27;activities&#x27; measuring things such as goal completion and competitions for these solutions much like human based coding competitions or debate clubs with human judges with subject-matter expertise.<p>Also, tested on things like adherence to examples given in the prompt (I.e. few shot) and their contextual usage. As fine tuned open sourced models grow more complex we should move away from parameter count and a default set of criteria to testing these LLMs similarly to as we would hiring a team member for a specific set of functions in a given industry.

no comments

no comments