TechEcho

12 comments

ekiddabout 1 year ago

Remember: LLMs are basically trained to be really gifted improv actors. That's what all that "guess the next token" training boils down to. Once that base of "improv" is established, they're then trained to play a specific role, "the helpful and harmless assistant". Somehow this produces useful output much of the time, as ridiculous as it seems.But whenever almost anything goes wrong, LLMs fall back to improv games. It's like you're talking to the cast of "Whose Line is it Anyway?" You set the scene, and the model tries to play along.So a case like this proves almost nothing. If you ask an improv actor "What's the last question you were asked?", then they're just going to make something up. So will these models. If you give a model a sentence with 2 grammatical errors, and ask it to find 3, it will usually make up a third error. If it doesn't know the answer to a question, it will likely hallucinate.GPT-4 is a little better at resisting the urge to make things up.

评论 #40033269 未加载

评论 #40035238 未加载

评论 #40035860 未加载

评论 #40033172 未加载

lolinderabout 1 year ago

The only story is here that some people think they can recognize a hallucination by how the text looks. You can't—the only difference between a hallucination and a valid response is that one randomly sampled response happens to produce a factual statement and the other doesn't. There's no stylistic difference, there are no tells. The only way to recognize a hallucination is to fact check with an independent source.I'm starting to agree with another commenter [0] that the word "hallucination" is a problem—it implies that there's some malfunction that sometimes happens to cause it to produce an inaccurate result, but this isn't a good model of what's happening. There is no malfunction, no psychoactive chemical getting in the way of normal processes. There is only sampling from the model's distribution.[0] <a href="https://news.ycombinator.com/item?id=40033109">https://news.ycombinator.com/item?id=40033109</a>

评论 #40035885 未加载

评论 #40033800 未加载

评论 #40033727 未加载

评论 #40033437 未加载

estebarbabout 1 year ago

The most reasonable explanation is that it allucinated it, was trained to answer that or it was in the prompt.These models are stateless, they don't remember anything, they are read only. If they can remember previous messages is just because the prompt is the concatenation of the new prompt and something like "summarize this conversation: {whole messages in conversation}".(Disclosure: I work at Microsoft, but nowhere near anything related with copilot.)

评论 #40033085 未加载

评论 #40033196 未加载

Calvin02about 1 year ago

I just asked “What was the last question that you were asked?”Co-pilot: The last question I was asked was about the height of Mount Everest in terms of bananas.Looks like a canned response.

LeoPantheraabout 1 year ago

There's something weird going on with the "46,449 bananas" stat. Copilot seems to inject it randomly into long pieces of generated text.If you Google "46,449 bananas" you can find all sorts of unrelated web pages that I guess include text generated with Copilot and then were never checked by a human.

评论 #40033469 未加载

评论 #40033176 未加载

vel0cityabout 1 year ago

I don't see how any of this can't be explained by just hallucinations.

评论 #40032942 未加载

评论 #40033416 未加载

taneqabout 1 year ago

Copilot is terrible compared with what Bing used to be even recently, and recent Bing is terrible compared with early Bing. Most times I ask Copilot a question it'll confidently answer a similar more mainstream question that I didn't ask, and then repeat that answer with minor changes in phrasing no matter how many times I explain that this wasn't what I was looking for.Also if you call it Bing it gets really mad. :P

kencauseyabout 1 year ago

I just tested this in Edge with Copilot chat and got a similar answer to that in the posted article. However, it was clearly labeled as the result of a web search which I translate as it searched Bing for<pre><code> "What was the previous question that I asked you?" </code></pre> and it processed the result it found into<pre><code> The previous question you asked was about the height of Mount Everest in terms of bananas. I provided a whimsical comparison, estimating that Mount Everest’s height is roughly equivalent to 46,449 bananas stacked on top of one another.</code></pre>

999900000999about 1 year ago

Never tell a search engine or LLM anything you wouldn't say in public.Nothing is anonymous. If you want true privacy you'd probably need to run your LLM locally AND airgap it.

zaptremabout 1 year ago

Copilot’s prompt contains k-shot question answering examples. This is one of them.

paxysabout 1 year ago

I don't think these people understand how hallucinations work.

评论 #40033286 未加载

beefnugsabout 1 year ago

No one seems to be talking about the typo "Everent", isnt it simply a matter of finding that typo on the source-data/internet ? to prove either way?

12 comments

ekiddabout 1 year ago

评论 #40033269 未加载

评论 #40035238 未加载

评论 #40035860 未加载

评论 #40033172 未加载

lolinderabout 1 year ago

评论 #40035885 未加载

评论 #40033800 未加载

评论 #40033727 未加载

评论 #40033437 未加载

estebarbabout 1 year ago

评论 #40033085 未加载

评论 #40033196 未加载

Calvin02about 1 year ago

I just asked “What was the last question that you were asked?”Co-pilot: The last question I was asked was about the height of Mount Everest in terms of bananas.Looks like a canned response.

LeoPantheraabout 1 year ago

评论 #40033469 未加载

评论 #40033176 未加载

vel0cityabout 1 year ago

I don't see how any of this can't be explained by just hallucinations.

评论 #40032942 未加载

评论 #40033416 未加载

taneqabout 1 year ago

kencauseyabout 1 year ago

999900000999about 1 year ago

Never tell a search engine or LLM anything you wouldn't say in public.Nothing is anonymous. If you want true privacy you'd probably need to run your LLM locally AND airgap it.

zaptremabout 1 year ago

Copilot’s prompt contains k-shot question answering examples. This is one of them.

paxysabout 1 year ago

I don't think these people understand how hallucinations work.

评论 #40033286 未加载

beefnugsabout 1 year ago

No one seems to be talking about the typo "Everent", isnt it simply a matter of finding that typo on the source-data/internet ? to prove either way?

Bing Copilot seemingly leaks state between user sessions

12 comments

Bing Copilot seemingly leaks state between user sessions

12 comments