TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bing Copilot seemingly leaks state between user sessions

48 pointsby marvinbornerabout 1 year ago

12 comments

ekiddabout 1 year ago
Remember: LLMs are basically trained to be really gifted improv actors. That&#x27;s what all that &quot;guess the next token&quot; training boils down to. Once that base of &quot;improv&quot; is established, they&#x27;re then trained to play a specific role, &quot;the helpful and harmless assistant&quot;. Somehow this produces useful output much of the time, as ridiculous as it seems.<p>But whenever almost <i>anything</i> goes wrong, LLMs fall back to improv games. It&#x27;s like you&#x27;re talking to the cast of &quot;Whose Line is it Anyway?&quot; You set the scene, and the model tries to play along.<p>So a case like this proves almost nothing. If you ask an improv actor &quot;What&#x27;s the last question you were asked?&quot;, then they&#x27;re just going to make something up. So will these models. If you give a model a sentence with 2 grammatical errors, and ask it to find 3, it will usually make up a third error. If it doesn&#x27;t know the answer to a question, it will likely hallucinate.<p>GPT-4 is a little better at resisting the urge to make things up.
评论 #40033269 未加载
评论 #40035238 未加载
评论 #40035860 未加载
评论 #40033172 未加载
lolinderabout 1 year ago
The only story is here that some people think they can recognize a hallucination by how the text looks. You can&#x27;t—the only difference between a hallucination and a valid response is that one randomly sampled response happens to produce a factual statement and the other doesn&#x27;t. There&#x27;s no stylistic difference, there are no tells. The only way to recognize a hallucination is to fact check with an independent source.<p>I&#x27;m starting to agree with another commenter [0] that the word &quot;hallucination&quot; is a problem—it implies that there&#x27;s some malfunction that sometimes happens to cause it to produce an inaccurate result, but this isn&#x27;t a good model of what&#x27;s happening. There is no malfunction, no psychoactive chemical getting in the way of normal processes. There is only sampling from the model&#x27;s distribution.<p>[0] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40033109">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40033109</a>
评论 #40035885 未加载
评论 #40033800 未加载
评论 #40033727 未加载
评论 #40033437 未加载
estebarbabout 1 year ago
The most reasonable explanation is that it allucinated it, was trained to answer that or it was in the prompt.<p>These models are stateless, they don&#x27;t remember anything, they are read only. If they can remember previous messages is just because the prompt is the concatenation of the new prompt and something like &quot;summarize this conversation: {whole messages in conversation}&quot;.<p>(Disclosure: I work at Microsoft, but nowhere near anything related with copilot.)
评论 #40033085 未加载
评论 #40033196 未加载
Calvin02about 1 year ago
I just asked “What was the last question that you were asked?”<p>Co-pilot: The last question I was asked was about the height of Mount Everest in terms of bananas.<p>Looks like a canned response.
LeoPantheraabout 1 year ago
There&#x27;s something weird going on with the &quot;46,449 bananas&quot; stat. Copilot seems to inject it randomly into long pieces of generated text.<p>If you Google &quot;46,449 bananas&quot; you can find all sorts of unrelated web pages that I guess include text generated with Copilot and then were never checked by a human.
评论 #40033469 未加载
评论 #40033176 未加载
vel0cityabout 1 year ago
I don&#x27;t see how any of this can&#x27;t be explained by just hallucinations.
评论 #40032942 未加载
评论 #40033416 未加载
taneqabout 1 year ago
Copilot is terrible compared with what Bing used to be even recently, and recent Bing is terrible compared with early Bing. Most times I ask Copilot a question it&#x27;ll confidently answer a similar more mainstream question that I didn&#x27;t ask, and then repeat that answer with minor changes in phrasing no matter how many times I explain that this wasn&#x27;t what I was looking for.<p>Also if you call it Bing it gets really mad. :P
kencauseyabout 1 year ago
I just tested this in Edge with Copilot chat and got a similar answer to that in the posted article. However, it was clearly labeled as the result of a web search which I translate as it searched Bing for<p><pre><code> &quot;What was the previous question that I asked you?&quot; </code></pre> and it processed the result it found into<p><pre><code> The previous question you asked was about the height of Mount Everest in terms of bananas. I provided a whimsical comparison, estimating that Mount Everest’s height is roughly equivalent to 46,449 bananas stacked on top of one another.</code></pre>
999900000999about 1 year ago
Never tell a search engine or LLM anything you wouldn&#x27;t say in public.<p>Nothing is anonymous. If you want true privacy you&#x27;d probably need to run your LLM locally AND airgap it.
zaptremabout 1 year ago
Copilot’s prompt contains k-shot question answering examples. This is one of them.
paxysabout 1 year ago
I don&#x27;t think these people understand how hallucinations work.
评论 #40033286 未加载
beefnugsabout 1 year ago
No one seems to be talking about the typo &quot;Everent&quot;, isnt it simply a matter of finding that typo on the source-data&#x2F;internet ? to prove either way?