Ask HN: What are you using for LLM response testing and benchmarking?

4 pointsby tin7inalmost 2 years ago

What are you using to test your LLM responses, benchmark them, maybe compare different versions?<p>I've seen a few YC startups focusing on this but I haven't decided yet if we should build this internally or use an external tool.

2 comments

tikkunalmost 2 years ago

What features are critical for you in your response testing and benchmarking? I think it'll be easier to answer with specifics.

评论 #36338533 未加载

muzanialmost 2 years ago

This list has been nice: <a href="https://lmsys.org/blog/2023-05-25-leaderboard/" rel="nofollow noreferrer">https://lmsys.org/blog/2023-05-25-leaderboard/</a><p>And you can participate in the arena, which pits them against each other. I'm surprised that I actually voted for GPT-3.5 over GPT-4 for a lot of my use cases.