TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How does one detect hallucinations?

5 点作者 sourabh03agr大约 1 年前
Hallucinations are an interesting artifact of LLMs where the model tends to make up facts or generate outputs that are not factually correct.<p>There are two broad approaches for detecting hallucinations:<p>1. Verify the correctness of the response against world knowledge (via Google&#x2F;Bing search)<p>2. Verify the groundedness of the response against the information present in the retrieved context<p>The 2nd approach is more interesting and useful as the majority of LLM applications have an RAG component, and we ideally want the LLM only to utilize the retrieved knowledge to generate the response.<p>While researching state-of-the-art techniques on how to verify that the response is grounded wrt context, two of the papers stood out to us:<p>1. FactScore (https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2305.14251.pdf): Developed by researchers at UW, UMass Amherst, Allen AI and Meta, it first breaks down the response into a series of independent facts and independently verifies if each of them.<p>2. Automatic Evaluation of Attribution by LLMs (https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2305.06311.pdf): Developed by researchers at Ohio State University, it prompts the LLM judge to determine whether the response is attributable (can be verified), extrapolatory (unclear) or contradictory (can’t be verified).<p>While both the papers are awesome reads, you can observe that they tackle complementary problems and, hence, can be combined for superior performance:<p>1. The responses in production systems typically consist of multiple assertions; hence, breaking them into facts, evaluating them individually, and taking average is a more practical approach.<p>2. Many responses in production systems fall in the grey area, i.e. the context may not explicitly support (or disprove) them but one can make a reasonable argument to infer them from the context. Hence, having three options - Yes, No, Unclear is a more practical approach<p>This is exactly what we do at UpTrain to evaluate factual accuracy. Learn more about it: https:&#x2F;&#x2F;docs.uptrain.ai&#x2F;predefined-evaluations&#x2F;context-awareness&#x2F;factual-accuracy

1 comment

navjack27大约 1 年前
A fun uphill battle while LLMs aren&#x27;t trained on entirely factual data from the beginning, and I mean the very start. They are not fact regurgitating programs. People will make a lot of money saying they have this solved but all they have are bandaids. I personally don&#x27;t want a factual LLM I want one that helps me with sparking my own creativity. They do that right now. Hallucinations are a feature not a bug.
评论 #39461186 未加载