TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

(Discussion) What method are yall using to evaluate LLM outputs?

1 pointsby jipsteralmost 2 years ago
My company's having some trouble with quantifying how well our LLM is performing, wondering how are yall solving this problem?

2 comments

NoZebra120vClipalmost 2 years ago
I try to gauge how clever it has been, or how creative. If the poetry rhymes and the meter works, that is a good start. Or if a screenplay incorporates interesting trivia about the characters I named.<p>I don&#x27;t know how some of these creators do it, like with the scripts for videos about Greta Thunberg&#x27;s logging operations, or Steve Jobs&#x27; apple orchards. They make crazy puns and very subtle references to things, it is hard to believe that the LLM generates that without elaborate prompting or manual editing.<p>One of the first LLM conversations I had was where I prompted it to write a comedy about Constantine the Great meeting his mother, Saint Helena. I believe I was using Bing Chat at the time, and the LLM actually generated a quite explicit story about incest! It did not renege at the end but it left me with the full output, sort of staring and disbelieving it had just said things like that.<p>So sometimes, it is fairly easy to quantify when your LLM has gone off the rails. Just ask Nick Cave.
retrocryptidalmost 2 years ago
Doesn&#x27;t &quot;Y&#x27;all&quot; have an apostrophe?