TechEcho

My company's having some trouble with quantifying how well our LLM is performing, wondering how are yall solving this problem?

I try to gauge how clever it has been, or how creative. If the poetry rhymes and the meter works, that is a good start. Or if a screenplay incorporates interesting trivia about the characters I named.<p>I don't know how some of these creators do it, like with the scripts for videos about Greta Thunberg's logging operations, or Steve Jobs' apple orchards. They make crazy puns and very subtle references to things, it is hard to believe that the LLM generates that without elaborate prompting or manual editing.<p>One of the first LLM conversations I had was where I prompted it to write a comedy about Constantine the Great meeting his mother, Saint Helena. I believe I was using Bing Chat at the time, and the LLM actually generated a quite explicit story about incest! It did not renege at the end but it left me with the full output, sort of staring and disbelieving it had just said things like that.<p>So sometimes, it is fairly easy to quantify when your LLM has gone off the rails. Just ask Nick Cave.

Doesn't "Y'all" have an apostrophe?

My company's having some trouble with quantifying how well our LLM is performing, wondering how are yall solving this problem?

Doesn't "Y'all" have an apostrophe?

(Discussion) What method are yall using to evaluate LLM outputs?

2 comments

(Discussion) What method are yall using to evaluate LLM outputs?

2 comments