TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What are you using for LLM response testing and benchmarking?

4 pointsby tin7inalmost 2 years ago
What are you using to test your LLM responses, benchmark them, maybe compare different versions?<p>I&#x27;ve seen a few YC startups focusing on this but I haven&#x27;t decided yet if we should build this internally or use an external tool.

2 comments

tikkunalmost 2 years ago
What features are critical for you in your response testing and benchmarking? I think it&#x27;ll be easier to answer with specifics.
评论 #36338533 未加载
muzanialmost 2 years ago
This list has been nice: <a href="https:&#x2F;&#x2F;lmsys.org&#x2F;blog&#x2F;2023-05-25-leaderboard&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;lmsys.org&#x2F;blog&#x2F;2023-05-25-leaderboard&#x2F;</a><p>And you can participate in the arena, which pits them against each other. I&#x27;m surprised that I actually voted for GPT-3.5 over GPT-4 for a lot of my use cases.