I'd like a rational explanation of how the LLM interprets "don't hallucinate" -Is it perhaps "translated" internally to the functional equivalent of a higher confidence check on output?<p>Otherwise, I think it's baloney. I know there is not a simple linear mapping from plain english to the ML, but the typed word clearly is capable of being parsed and processed, its the "somehow" I'd like to understand better. What would this do the interpretation of paths through weights.<p>Pretty much 'citation needed'
Everything about prompt engineering is just the voodoo chicken.<p><a href="https://wiki.c2.com/?VoodooChickenCoding" rel="nofollow">https://wiki.c2.com/?VoodooChickenCoding</a>
Interestingly, negative prompts for stable diffusion (like "deformed hands") has similar effect.
How LLM decides what's hallucinations? Mayhaps, it double checks itself? But probably it became self-aware.