TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How do you personally evaluate LLMs?

2 pointsby cloudking4 months ago
I’ve seen the standard evals and benchmarks for new LLMs, but they don’t really capture how I actually use them. My own test is pretty specific: whenever a new LLM drops, I ask it to “Write an advanced three.js music visualizer.” Then I compare it to older models by checking:<p>1. Does it use a recent version of three.js?<p>2. Does the generated code run out of the box?<p>3. How complex&#x2F;innovative is the visualizer?<p>I’m really curious to hear about other people’s “real-world” benchmarks. What’s your personal test prompt or scenario that reveals whether a new LLM is actually useful for you? How do you decide if it’s truly better than the last version?

no comments

no comments