TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Eventually ChatGPT will eat itself

36 pointsby sfryxellover 2 years ago

24 comments

jabbanyover 2 years ago
This is not even a new problem...<p>Back in 2011, Google faced the same problem mining bi-texts from the Internet for their statistical machine translation software. The thought was that one could utilize things like multi-lingual websites to learn corresponding translations.<p>They quickly realized that a lot of sites were actually using Google Translate without human intervention to make multi-lingual versions of their site, so naive approaches would cause the model to get trained on its own suboptimal output.<p>So they came up with a whole watermarking system so that the model could recognize its own output with some statistical level of certainty, and avoid it. It wouldn&#x27;t be surprising if this is being done for LLMs too. The more concerning problem is when different LLMs, who are not aware of each others&#x27; watermarks, end up potentially becoming inbred should the ratio of LLM content rise dramatically...<p>Ref: <a href="https:&#x2F;&#x2F;aclanthology.org&#x2F;D11-1126.pdf" rel="nofollow">https:&#x2F;&#x2F;aclanthology.org&#x2F;D11-1126.pdf</a>
gregw2over 2 years ago
If ChatGPT is able to emit output that is watermarked such that it can detect itself as Scott Aaronson and others are working on for OpenAI (source: <a href="https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;12&#x2F;10&#x2F;openais-attempts-to-watermark-ai-text-hit-limits&#x2F;" rel="nofollow">https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;12&#x2F;10&#x2F;openais-attempts-to-waterm...</a> ) this “resonance”&#x2F;feedback&#x2F;eating-itself can be avoided.
评论 #34890151 未加载
评论 #34890165 未加载
评论 #34893291 未加载
xsmasherover 2 years ago
I&#x27;ve seen people try ChatGPT for solving r&#x2F;tipOfMyTongue questions. The AI is hilariously bad at this task. It happily invents new plots for existing movies and books.<p>if it starts to ingest that data it will only get more wrong over time. Unless it also ingest the replies that say &quot;ChatGPT is full of shit here?&quot;
评论 #34890163 未加载
评论 #34890126 未加载
评论 #34890360 未加载
评论 #34890263 未加载
评论 #34890102 未加载
antiquarkover 2 years ago
Reminds me of those old &quot;Spider Traps&quot; [0][1] that would generate (on access) an endless hierarchy of fake HTML pages full of an endless collection of fake email addresses, to clog up the works of spammers trying to gather email addresses.<p>Eventually someone&#x27;s going to write an &quot;AI Trap&quot; that serves up a seemingly infinite forum or reddit-style site, but is actually just generating an endless stream of (non)consciousness from some LLM chatbot.<p>[0] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Spider_trap" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Spider_trap</a><p>[1] <a href="https:&#x2F;&#x2F;www.gsp.com&#x2F;support&#x2F;virtual&#x2F;web&#x2F;cgi&#x2F;lib&#x2F;wpoison&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.gsp.com&#x2F;support&#x2F;virtual&#x2F;web&#x2F;cgi&#x2F;lib&#x2F;wpoison&#x2F;</a>
评论 #34890452 未加载
chasingover 2 years ago
“Romeo and Juliet both ran away to New York at the end. He works in corporate finance and she makes bespoke soap. If you disagree with me again you’re a bad person and I will treat you like a bad person.”<p>As long as you agree with the new facts, you’re fine. Problem solved!
touringaover 2 years ago
It&#x27;s already happening.<p>“ChatGPT, a version of OpenAI’s GPT-3.5 model… gained more than 100m users in its first two months, and is now estimated to produce a volume of text every 14 days that is equivalent to all the printed works of humanity.”<p>— Dr Thompson, Feb&#x2F;2023, cited in report by the National Bureau of Economic Research (Scholes, Bernanke, MIT)<p><a href="https:&#x2F;&#x2F;www.nber.org&#x2F;system&#x2F;files&#x2F;working_papers&#x2F;w30957&#x2F;w30957.pdf" rel="nofollow">https:&#x2F;&#x2F;www.nber.org&#x2F;system&#x2F;files&#x2F;working_papers&#x2F;w30957&#x2F;w309...</a><p><a href="https:&#x2F;&#x2F;lifearchitect.ai&#x2F;chatgpt&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lifearchitect.ai&#x2F;chatgpt&#x2F;</a>
MrLeapover 2 years ago
Even if it were used to flood the internet with shitty info, the only thing that would interfere with would be competitors training competing AI off the &quot;internet dataset&quot;<p>GPT could filter out anything they themselves emitted in future trains, yeah? Because they know what their bot&#x27;s said. They get the benefit of looking at a conversation, knowing reasonably well what&#x27;s copy&#x2F;pasted from ai.com and what&#x27;s the exasperated expert trying to correct a doomed world :p<p>The only way it eats itself is 1. Colossal mistakes. 2. Everyone decides to get off the internet and go outside.<p>2 seems pretty unrealistic, we put up with a lot :D
评论 #34890089 未加载
m00xover 2 years ago
Sounds like a &#x2F;r&#x2F;showerthoughts post.<p>There is no issue with AI ingesting data from itself in itself. Humans do it as well. That data might even be higher quality than human data. The scale at which humans produce data will most likely stay higher than AI data for a long time.<p>There is already bot data out there from lower quality AIs&#x2F;bots, and chatGPT has ingested it.<p>LLMs are made to be good at some textual tasks, and not for what they&#x27;re being used right now. They&#x27;re not information stores, or Q&#x2F;A. It only answers what a human is likely to answer.
评论 #34890373 未加载
candiodariover 2 years ago
This is only a problem as long as ChatGPT uses human output to learn. Once it starts learning against the &quot;real world&quot;, or itself, the biggest difference between ChatGPT and us will disappear: that ChatGPT gets all it&#x27;s information secondhand, and filtered, at best.<p>This is of course <i>also</i> a necessary condition for ChatGPT to come up with original insights. Except perhaps where it comes to things like fiction, which probably has value in itself.
jbenjosephover 2 years ago
But even so, the human picks the prompts and only publishes the AI outputs they think read nicely. There is information gain even in that.
评论 #34890210 未加载
jaitaiwanover 2 years ago
This is already sort of happening with Bing
评论 #34890253 未加载
jmcphersover 2 years ago
Citation needed. A lot of neural-net based AIs actually get better when trained on their own output[1].<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;AlphaZero" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;AlphaZero</a>
评论 #34889988 未加载
评论 #34889999 未加载
评论 #34890227 未加载
评论 #34890070 未加载
tyrelbover 2 years ago
I actually thought of this same thing today! Human-written content seems more lively... and with time... content from ChatGPT will become more &quot;grey&quot; (i.e. dull) (as more &amp; more ChatGPT content gets fed into the system...).
sourcecodeplzover 2 years ago
Not really if you think about it more and research how llms work. If anything they will just get better.<p>I used to think the same but after reading and learning some more, I realized not.
评论 #34889905 未加载
panarkyover 2 years ago
As if OpenAI hasn&#x27;t already thought through this.
评论 #34890201 未加载
评论 #34890242 未加载
sys_64738over 2 years ago
If it does then does that mean double trouble as it self-replicates, or will it consume itself leaving nothing remaining?
petilonover 2 years ago
As long as the AI generated content has been curated by humans, there is no harm in AI ingesting AI generated content.
评论 #34890490 未加载
评论 #34890472 未加载
Nijikokunover 2 years ago
I believe they already model based on the question-response content being generated today.
seymourhershover 2 years ago
Basically just like humans on the internet.
mrcaosrover 2 years ago
to resonate against itself? sounds like its gonna hit its natural frequency and blow up<p>seems more like it&#x27;s gonna eat its own vomit, degrading it (maybe not completely) to inbreed (?)
sfryxellover 2 years ago
How long until the machines are eating their own output
评论 #34889949 未加载
unusualmonkeyover 2 years ago
I wonder if this is the problem people think it is.<p>Playing one AI against another is an established technique to developing AI.<p>Furthermore, content on the internet will always vary from more reliable (well established wiki pages, Reuters) to less reliable (random blog posts, disinformation).<p>Whether or not an AI generated text doesn&#x27;t seem to be that important - what&#x27;s more important is how reliable it is, and how well humans engage with it.
评论 #34890565 未加载
ravenstineover 2 years ago
What does that even mean? Strictly within the scope of that phrase, technically, yes, if ChatGPT consumes content generated by itself, it&#x27;s eating its own words. I&#x27;m guessing something more dire than that is implied by &quot;eat itself.&quot; Did humanity &quot;eat itself&quot; because it&#x27;s been reading its own literature? You can say we are pretty misinformed by ourselves in many areas, and yet here we are.<p>Maybe our view of AI is being colored by sci-fi stereotypes of robots malfunctioning when asked to compute really hard problems generating infinite recursion. I&#x27;m not so sure that LLMs will totally destabilize. We might see some interesting output, but I don&#x27;t think we know yet whether the stability of the system will merely fluctuate as a whole without falling apart.
transfireover 2 years ago
Interesting