Remember: LLMs are basically trained to be really gifted improv actors. That's what all that "guess the next token" training boils down to. Once that base of "improv" is established, they're then trained to play a specific role, "the helpful and harmless assistant". Somehow this produces useful output much of the time, as ridiculous as it seems.<p>But whenever almost <i>anything</i> goes wrong, LLMs fall back to improv games. It's like you're talking to the cast of "Whose Line is it Anyway?" You set the scene, and the model tries to play along.<p>So a case like this proves almost nothing. If you ask an improv actor "What's the last question you were asked?", then they're just going to make something up. So will these models. If you give a model a sentence with 2 grammatical errors, and ask it to find 3, it will usually make up a third error. If it doesn't know the answer to a question, it will likely hallucinate.<p>GPT-4 is a little better at resisting the urge to make things up.
The only story is here that some people think they can recognize a hallucination by how the text looks. You can't—the only difference between a hallucination and a valid response is that one randomly sampled response happens to produce a factual statement and the other doesn't. There's no stylistic difference, there are no tells. The only way to recognize a hallucination is to fact check with an independent source.<p>I'm starting to agree with another commenter [0] that the word "hallucination" is a problem—it implies that there's some malfunction that sometimes happens to cause it to produce an inaccurate result, but this isn't a good model of what's happening. There is no malfunction, no psychoactive chemical getting in the way of normal processes. There is only sampling from the model's distribution.<p>[0] <a href="https://news.ycombinator.com/item?id=40033109">https://news.ycombinator.com/item?id=40033109</a>
The most reasonable explanation is that it allucinated it, was trained to answer that or it was in the prompt.<p>These models are stateless, they don't remember anything, they are read only. If they can remember previous messages is just because the prompt is the concatenation of the new prompt and something like "summarize this conversation: {whole messages in conversation}".<p>(Disclosure: I work at Microsoft, but nowhere near anything related with copilot.)
I just asked “What was the last question that you were asked?”<p>Co-pilot: The last question I was asked was about the height of Mount Everest in terms of bananas.<p>Looks like a canned response.
There's something weird going on with the "46,449 bananas" stat. Copilot seems to inject it randomly into long pieces of generated text.<p>If you Google "46,449 bananas" you can find all sorts of unrelated web pages that I guess include text generated with Copilot and then were never checked by a human.
Copilot is terrible compared with what Bing used to be even recently, and recent Bing is terrible compared with early Bing. Most times I ask Copilot a question it'll confidently answer a similar more mainstream question that I didn't ask, and then repeat that answer with minor changes in phrasing no matter how many times I explain that this wasn't what I was looking for.<p>Also if you call it Bing it gets really mad. :P
I just tested this in Edge with Copilot chat and got a similar answer to that in the posted article. However, it was clearly labeled as the result of a web search which I translate as it searched Bing for<p><pre><code> "What was the previous question that I asked you?"
</code></pre>
and it processed the result it found into<p><pre><code> The previous question you asked was about the height of Mount Everest in terms of bananas. I provided a whimsical comparison, estimating that Mount Everest’s height is roughly equivalent to 46,449 bananas stacked on top of one another.</code></pre>
Never tell a search engine or LLM anything you wouldn't say in public.<p>Nothing is anonymous. If you want true privacy you'd probably need to run your LLM locally AND airgap it.
No one seems to be talking about the typo "Everent", isnt it simply a matter of finding that typo on the source-data/internet ? to prove either way?