TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPT-4o is consistently hallucinating

26 点作者 freesam大约 1 年前

6 条评论

pietz大约 1 年前
I haven&#x27;t noticed that GPT-4o hallucinates a lot more than the previous version but I noticed 2 other things of which especially the latter seems relevant here.<p>1) it&#x27;s insanely chatty, to a point where it ignores instructions about not doing certain things. I think this behavior is heavily favoured by benchmarks but as somehow who expects concise answers, this model annoys me. Custom instructions don&#x27;t fully fix this for me.<p>2) It likes repetitive answers a lot more than the previous version. Meaning that it will try its hardest to generate the followup answer in the same format as the first one. I think this is also the problem in your example.<p>To my understanding, this is a measure against laziness, where the model would exclude information from the first answer that haven&#x27;t changed in the followup. I always liked this behavior but maybe you remember the time from a few months ago where many people complained about the laziness of (I believe) 0125.<p>Btw, while I type this, I notice that this is probably the highest level of first world problems I&#x27;ve ever complained about. There is this amazing almost free tool that answers all my questions and does most of my coding and I dislike it because it provides me with thorough context.
评论 #40543682 未加载
评论 #40543652 未加载
评论 #40544040 未加载
freesam大约 1 年前
We found that gpt-4o is consistently hallucinating. Here is an example: [ <a href="https:&#x2F;&#x2F;platform.openai.com&#x2F;playground&#x2F;chat?models=gpt-4o&amp;preset=yEwza9Ibnw3RnQxGNnTwzOqd" rel="nofollow">https:&#x2F;&#x2F;platform.openai.com&#x2F;playground&#x2F;chat?models=gpt-4o&amp;pr...</a> ] In this case, we never mentioned any store location in Ringsted. However , gpt-4o still fakes a store in the address: Adresse: Ringstedet, Klosterparks Allé 10, 4100 Ringsted Åbningstider: Mandag-fredag: 9.30-19, Lørdag: 10-17, Søndag: 11-16 Tlf: 50603398 Email: ringsted4100@gmail.com This hallucination is quite consistent. The gpt-4-turbo or even gpt-3.5 model doesn&#x27;t have the same issue and correctly acknolwedged that there is no store in this location.
bhaney大约 1 年前
&gt; Please log in to access this page<p>No thanks.<p>Not like an LLM hallucinating is particularly surprising or newsworthy anyway.
评论 #40543617 未加载
andrei512大约 1 年前
This isn&#x27;t front-page material...
评论 #40543623 未加载
lionkor大约 1 年前
I found that, compared to GPT-3.5, it refuses to shut up when told to shut up.<p>In the middle of a conversation, try going &quot;SHUT UP, STOP TALKING ALREADY&quot;. For me, it just keeps repeating the last output. Very cool.
评论 #40543578 未加载
评论 #40543540 未加载
threeseed大约 1 年前
Not sure what you are expecting.<p>The models are non-deterministic and there is no way to predict hallucinations.
评论 #40543591 未加载