Eventually ChatGPT will eat itself

36 pointsby sfryxellover 2 years ago

24 comments

jabbanyover 2 years ago

This is not even a new problem...Back in 2011, Google faced the same problem mining bi-texts from the Internet for their statistical machine translation software. The thought was that one could utilize things like multi-lingual websites to learn corresponding translations.They quickly realized that a lot of sites were actually using Google Translate without human intervention to make multi-lingual versions of their site, so naive approaches would cause the model to get trained on its own suboptimal output.So they came up with a whole watermarking system so that the model could recognize its own output with some statistical level of certainty, and avoid it. It wouldn't be surprising if this is being done for LLMs too. The more concerning problem is when different LLMs, who are not aware of each others' watermarks, end up potentially becoming inbred should the ratio of LLM content rise dramatically...Ref: <a href="https://aclanthology.org/D11-1126.pdf" rel="nofollow">https://aclanthology.org/D11-1126.pdf</a>

gregw2over 2 years ago

If ChatGPT is able to emit output that is watermarked such that it can detect itself as Scott Aaronson and others are working on for OpenAI (source: <a href="https://techcrunch.com/2022/12/10/openais-attempts-to-watermark-ai-text-hit-limits/" rel="nofollow">https://techcrunch.com/2022/12/10/openais-attempts-to-waterm...</a> ) this “resonance”/feedback/eating-itself can be avoided.

评论 #34890151 未加载

评论 #34890165 未加载

评论 #34893291 未加载

xsmasherover 2 years ago

I've seen people try ChatGPT for solving r/tipOfMyTongue questions. The AI is hilariously bad at this task. It happily invents new plots for existing movies and books.if it starts to ingest that data it will only get more wrong over time. Unless it also ingest the replies that say "ChatGPT is full of shit here?"

评论 #34890163 未加载

评论 #34890126 未加载

评论 #34890360 未加载

评论 #34890263 未加载

评论 #34890102 未加载

antiquarkover 2 years ago

Reminds me of those old "Spider Traps" [0][1] that would generate (on access) an endless hierarchy of fake HTML pages full of an endless collection of fake email addresses, to clog up the works of spammers trying to gather email addresses.Eventually someone's going to write an "AI Trap" that serves up a seemingly infinite forum or reddit-style site, but is actually just generating an endless stream of (non)consciousness from some LLM chatbot.[0] <a href="https://en.wikipedia.org/wiki/Spider_trap" rel="nofollow">https://en.wikipedia.org/wiki/Spider_trap</a>[1] <a href="https://www.gsp.com/support/virtual/web/cgi/lib/wpoison/" rel="nofollow">https://www.gsp.com/support/virtual/web/cgi/lib/wpoison/</a>

评论 #34890452 未加载

chasingover 2 years ago

“Romeo and Juliet both ran away to New York at the end. He works in corporate finance and she makes bespoke soap. If you disagree with me again you’re a bad person and I will treat you like a bad person.”As long as you agree with the new facts, you’re fine. Problem solved!

touringaover 2 years ago

It's already happening.“ChatGPT, a version of OpenAI’s GPT-3.5 model… gained more than 100m users in its first two months, and is now estimated to produce a volume of text every 14 days that is equivalent to all the printed works of humanity.”— Dr Thompson, Feb/2023, cited in report by the National Bureau of Economic Research (Scholes, Bernanke, MIT)<a href="https://www.nber.org/system/files/working_papers/w30957/w30957.pdf" rel="nofollow">https://www.nber.org/system/files/working_papers/w30957/w309...</a><a href="https://lifearchitect.ai/chatgpt/" rel="nofollow">https://lifearchitect.ai/chatgpt/</a>

MrLeapover 2 years ago

Even if it were used to flood the internet with shitty info, the only thing that would interfere with would be competitors training competing AI off the "internet dataset"GPT could filter out anything they themselves emitted in future trains, yeah? Because they know what their bot's said. They get the benefit of looking at a conversation, knowing reasonably well what's copy/pasted from ai.com and what's the exasperated expert trying to correct a doomed world :pThe only way it eats itself is 1. Colossal mistakes. 2. Everyone decides to get off the internet and go outside.2 seems pretty unrealistic, we put up with a lot :D

评论 #34890089 未加载

m00xover 2 years ago

Sounds like a /r/showerthoughts post.There is no issue with AI ingesting data from itself in itself. Humans do it as well. That data might even be higher quality than human data. The scale at which humans produce data will most likely stay higher than AI data for a long time.There is already bot data out there from lower quality AIs/bots, and chatGPT has ingested it.LLMs are made to be good at some textual tasks, and not for what they're being used right now. They're not information stores, or Q/A. It only answers what a human is likely to answer.

评论 #34890373 未加载

candiodariover 2 years ago

This is only a problem as long as ChatGPT uses human output to learn. Once it starts learning against the "real world", or itself, the biggest difference between ChatGPT and us will disappear: that ChatGPT gets all it's information secondhand, and filtered, at best.This is of course also a necessary condition for ChatGPT to come up with original insights. Except perhaps where it comes to things like fiction, which probably has value in itself.

jbenjosephover 2 years ago

But even so, the human picks the prompts and only publishes the AI outputs they think read nicely. There is information gain even in that.

评论 #34890210 未加载

jaitaiwanover 2 years ago

This is already sort of happening with Bing

评论 #34890253 未加载

jmcphersover 2 years ago

Citation needed. A lot of neural-net based AIs actually get better when trained on their own output[1].[1] <a href="https://en.wikipedia.org/wiki/AlphaZero" rel="nofollow">https://en.wikipedia.org/wiki/AlphaZero</a>

评论 #34889988 未加载

评论 #34889999 未加载

评论 #34890227 未加载

评论 #34890070 未加载

tyrelbover 2 years ago

I actually thought of this same thing today! Human-written content seems more lively... and with time... content from ChatGPT will become more "grey" (i.e. dull) (as more & more ChatGPT content gets fed into the system...).

sourcecodeplzover 2 years ago

Not really if you think about it more and research how llms work. If anything they will just get better.I used to think the same but after reading and learning some more, I realized not.

评论 #34889905 未加载

panarkyover 2 years ago

As if OpenAI hasn't already thought through this.

评论 #34890201 未加载

评论 #34890242 未加载

sys_64738over 2 years ago

If it does then does that mean double trouble as it self-replicates, or will it consume itself leaving nothing remaining?

petilonover 2 years ago

As long as the AI generated content has been curated by humans, there is no harm in AI ingesting AI generated content.

评论 #34890490 未加载

评论 #34890472 未加载

Nijikokunover 2 years ago

I believe they already model based on the question-response content being generated today.

seymourhershover 2 years ago

Basically just like humans on the internet.

mrcaosrover 2 years ago

to resonate against itself? sounds like its gonna hit its natural frequency and blow upseems more like it's gonna eat its own vomit, degrading it (maybe not completely) to inbreed (?)

sfryxellover 2 years ago

How long until the machines are eating their own output

评论 #34889949 未加载

unusualmonkeyover 2 years ago

I wonder if this is the problem people think it is.Playing one AI against another is an established technique to developing AI.Furthermore, content on the internet will always vary from more reliable (well established wiki pages, Reuters) to less reliable (random blog posts, disinformation).Whether or not an AI generated text doesn't seem to be that important - what's more important is how reliable it is, and how well humans engage with it.

评论 #34890565 未加载

ravenstineover 2 years ago

What does that even mean? Strictly within the scope of that phrase, technically, yes, if ChatGPT consumes content generated by itself, it's eating its own words. I'm guessing something more dire than that is implied by "eat itself." Did humanity "eat itself" because it's been reading its own literature? You can say we are pretty misinformed by ourselves in many areas, and yet here we are.Maybe our view of AI is being colored by sci-fi stereotypes of robots malfunctioning when asked to compute really hard problems generating infinite recursion. I'm not so sure that LLMs will totally destabilize. We might see some interesting output, but I don't think we know yet whether the stability of the system will merely fluctuate as a whole without falling apart.

transfireover 2 years ago

Interesting