TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How does one detect hallucinations?

5 pointsby sourabh03agrover 1 year ago
Hallucinations are an interesting artifact of LLMs where the model tends to make up facts or generate outputs that are not factually correct.<p>There are two broad approaches for detecting hallucinations:<p>1. Verify the correctness of the response against world knowledge (via Google&#x2F;Bing search)<p>2. Verify the groundedness of the response against the information present in the retrieved context<p>The 2nd approach is more interesting and useful as the majority of LLM applications have an RAG component, and we ideally want the LLM only to utilize the retrieved knowledge to generate the response.<p>While researching state-of-the-art techniques on how to verify that the response is grounded wrt context, two of the papers stood out to us:<p>1. FactScore (https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2305.14251.pdf): Developed by researchers at UW, UMass Amherst, Allen AI and Meta, it first breaks down the response into a series of independent facts and independently verifies if each of them.<p>2. Automatic Evaluation of Attribution by LLMs (https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2305.06311.pdf): Developed by researchers at Ohio State University, it prompts the LLM judge to determine whether the response is attributable (can be verified), extrapolatory (unclear) or contradictory (can’t be verified).<p>While both the papers are awesome reads, you can observe that they tackle complementary problems and, hence, can be combined for superior performance:<p>1. The responses in production systems typically consist of multiple assertions; hence, breaking them into facts, evaluating them individually, and taking average is a more practical approach.<p>2. Many responses in production systems fall in the grey area, i.e. the context may not explicitly support (or disprove) them but one can make a reasonable argument to infer them from the context. Hence, having three options - Yes, No, Unclear is a more practical approach<p>This is exactly what we do at UpTrain to evaluate factual accuracy. Learn more about it: https:&#x2F;&#x2F;docs.uptrain.ai&#x2F;predefined-evaluations&#x2F;context-awareness&#x2F;factual-accuracy

1 comment

navjack27over 1 year ago
A fun uphill battle while LLMs aren&#x27;t trained on entirely factual data from the beginning, and I mean the very start. They are not fact regurgitating programs. People will make a lot of money saying they have this solved but all they have are bandaids. I personally don&#x27;t want a factual LLM I want one that helps me with sparking my own creativity. They do that right now. Hallucinations are a feature not a bug.
评论 #39461186 未加载