Hallucinations are an interesting artifact of LLMs where the model tends to make up facts or generate outputs that are not factually correct.<p>There are two broad approaches for detecting hallucinations:<p>1. Verify the correctness of the response against world knowledge (via Google/Bing search)<p>2. Verify the groundedness of the response against the information present in the retrieved context<p>The 2nd approach is more interesting and useful as the majority of LLM applications have an RAG component, and we ideally want the LLM only to utilize the retrieved knowledge to generate the response.<p>While researching state-of-the-art techniques on how to verify that the response is grounded wrt context, two of the papers stood out to us:<p>1. FactScore (https://arxiv.org/pdf/2305.14251.pdf): Developed by researchers at UW, UMass Amherst, Allen AI and Meta, it first breaks down the response into a series of independent facts and independently verifies if each of them.<p>2. Automatic Evaluation of Attribution by LLMs (https://arxiv.org/pdf/2305.06311.pdf): Developed by researchers at Ohio State University, it prompts the LLM judge to determine whether the response is attributable (can be verified), extrapolatory (unclear) or contradictory (can’t be verified).<p>While both the papers are awesome reads, you can observe that they tackle complementary problems and, hence, can be combined for superior performance:<p>1. The responses in production systems typically consist of multiple assertions; hence, breaking them into facts, evaluating them individually, and taking average is a more practical approach.<p>2. Many responses in production systems fall in the grey area, i.e. the context may not explicitly support (or disprove) them but one can make a reasonable argument to infer them from the context. Hence, having three options - Yes, No, Unclear is a more practical approach<p>This is exactly what we do at UpTrain to evaluate factual accuracy. Learn more about it: https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy