The article is a week old, it was already submitted a few days ago, and the problem remains about finding some paper to shed more light into the practice.<p>A blog article* came out yesterday - but it is not immediately clear whether the author wrote what he understood, or whether he knows more.<p>But much perplexity remains: «summarization, which is what LLMs generally excel at» (original article); «The LLM ... reads the patient’s records ... and produces a summary or list of facts» (blog). This is possibly the beginning, and some of us will already be scared - as the summarization capabilities we experienced from LLMs were neither intelligent nor reliable. (...Or did new studies come up and determine that LLMs have become finally reliable, if not cognitively proficient, in summarization?)<p>* <a href="https://usmanshaheen.wordpress.com/2025/03/14/reverse-rag-reduce-hallucinations-and-errors-in-medical-genai-part-1/" rel="nofollow">https://usmanshaheen.wordpress.com/2025/03/14/reverse-rag-re...</a>
Sounds similar to <a href="https://cloud.google.com/generative-ai-app-builder/docs/check-grounding" rel="nofollow">https://cloud.google.com/generative-ai-app-builder/docs/chec...</a>.<p>"The check grounding API returns an overall support score of 0 to 1, which indicates how much the answer candidate agrees with the given facts. The response also includes citations to the facts supporting each claim in the answer candidate.<p>Perfect grounding requires that every claim in the answer candidate must be supported by one or more of the given facts. In other words, the claim is wholly entailed by the facts. If the claim is only partially entailed, it is not considered grounded."<p>There's an example input and grounded output scores that shows how the model splits into claims, decides if the claim needs grounding, and the resulting entailment score for that claim in: <a href="https://cloud.google.com/generative-ai-app-builder/docs/check-grounding#claim-level-score-response-examples" rel="nofollow">https://cloud.google.com/generative-ai-app-builder/docs/chec...</a>
Can someone more versed in the field comment on whether this is just an ad or actually something unique or novel.<p>What they're describing as "reverse RAG" sounds a lot to me like "RAG with citations", which is a common technique. Am I misunderstanding?
Can someone link to a real source for this? Like, a paper or something? This seems very interesting and important and I'd prefer to look at something less sketchy than venturebeat.com
If LLMs were good at summarization, this wouldn't be necessary. Turns out a stochastic model of language is not a summary in the way humans think of summaries. Thus all this extra faff.
Curious if anyone has attempted this in an open source context? Would be incredibly interested to see an example in the wild that can point back to pages of a PDF etc!
If only we could understand the actual mechanism involved in "reverse RAG"... was anyone able to find anything on this beyond the fuzzy details in tfa?
Hmm, 90 minutes of bureaucracy for practitioners every day, data can be extremely difficult to find and parse out, how can I augment the abilities and simplify the work of the physician?<p>Let's smack a hallucinating LLM and try to figure out how to make it hallucinate less... Genius
> A second LLM then scored how well the facts aligned with those sources, specifically if there was a causal relationship between the two.<p>What is 'causal' about it? Maybe I'm reading one word too closely, but an accurate citation or summary isn't a matter of cause and effect?
That's like trying to stop a hemorrhage with a band-aid<p>Daily reminder that traditional AI expert systems from the 60s have 0 problems with hallucinations by virtue of their own architecture<p>Why we aren't building LLMs on top of ProbLog is a complete mystery to me (jk; it's because 90% of the people who work in AI right now have never heard of it; because they got into the field through statistics instead of logic, and all they know is how to mash matrices together).<p>Clearly language by itself doesn't cut it, you need some way to enforce logical rigor and capabilities such as backtracking if you care about getting an <i>explainable</i> answer out of the black box. Like we were doing 60 years ago before we suddenly forgot in favor of throwing teraflops at matrices.<p>If Prolog is Qt or, hell, even ncurses; then LLMs are basically Electron. They get the job done, but they're horribly inefficient and they're clearly not the best tool for the task. But inexperienced developers think that LLMs are this amazing oracle that solves every problem in the world, and so they throw LLMs at anything that vaguely looks like a problem.