Do yourself a favor and skip right through to the Twitter link to another link to this excellent post by Yoav Goldberg [1] on the actual reason that training new models on ChatGPT output in the manner of supervised learning (in contrast to reinforcement learning) will not produce a model as good as ChatGPT<p>>For this type of interaction, we must use RL training, as supervised training teaches the model to lie. The core issue is that we want to encourage the model to answer based on its internal knowledge, but we don't know what this internal knowledge contains. In supervised training, we present the model with a question and its correct answer, and train the model to replicate the provided answer.<p>The author says he’s summarizing a talk by John Schulman of OpenAI [2] but I haven’t personally watched the video. In any case, this is an interesting insight.<p>Say we set up a supervised learning scenario where we ask the model to use its internal knowledge to answer a question and compare its answer to one written by a human. If the two answers essentially say the same thing, but in different words, in the supervised learning case the model is penalized. In the RL case, it’s rewarded. That’s the difference.<p>1. <a href="https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81" rel="nofollow">https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd...</a><p>2. <a href="https://www.youtube.com/watch?v=hhiLw5Q_UFg">https://www.youtube.com/watch?v=hhiLw5Q_UFg</a>
> So I can easily imagine a near future where the web will be flooded by LLM output or at least by content heavily inspired or edited by LLMs.<p>To be fair, we're already there, and we've been there for at least 10 years now. I'd wager >75% of the internet is garbage: auto-generated blog posts, programmatically-permuted ads, YouTube videos that mainly regurgitate other sources. Email is mostly garbage and the only reason it's usable is because spam filters have gotten pretty good. Even non-trivial amounts of heavily-curated social media (Twitter/FB/IG) is purely spam.
As others have mentioned, it's frustrating to use a non-OpenAI model and to be told "I'm sorry, as an AI...", as it represents a reimplementation of someone else's censorship.<p>There are approaches such as Dolly to develop a non-openAI RHLF feedback set but it's hard to compete against ShareGPT and co.
This is not new. We ve been dealing with US standards of morality down to nipples since the beginning of the internet. People will get bored of ChatGPT outputs everywhere, however. We are very good at detecting repeated patterns and tend to find them banal.<p>There are now uncensored open source models. Vicuna like models are great, and even work for translation. It's eerie what a 10GB file can do
The article points out that training data generated using ChatGPT is necessarily biased or tainted with the consequences of the policy optimizations and RLHF alignment processes conducted by OpenAI. This results in models that reflect the alignment preferences of OpenAI instead of the preferences of the model developers.
It seems more and more plausible that OpenAI chose 2021-09 as a cut-off date was intentional. Because GPT-3 generated output was released into the wild after that.
It won’t work because will need as much training data as ChatGPT to get to its general knowledge level.<p>A subset will give you a subset of the knowledge, it’s no free lunch
I am also worried about LLM "indbreeding."<p>When I finetuned successive generations of ESRGAN on its own output (as I essentially wanted to use it for img2img), it would amplify <i>tiny</i> oddities and artifacts that, I would later find out, were in the training data. <i>Tiny</i> noise splotches, "swirls" and distorted line edges blew up. And I was careful... I pixel peeped the dataset as best I could before starting training.<p>Human language is obviously different, but I still fear oddities or biases will start popping up when the base models train on large fractions of their own data. And by the time we find out, it will be near impossible to filter out.<p>But continuing the analogy, maybe a diverse base model population is a good way to avoid that issue?
The author claims to be “flabbergasted” that people would want to stop work on world-changing AI projects.<p>The gulf between otherwise smart people on this very important issue should depress us all. Personally I feel as if a mutant species has been released into the wild, yet as in Rick and Morty, some people think the wisest course is to release a lot more mutants.<p>People are fools. Hackers more than most— though we are productive and useful fools much of the time— but it hasn’t been a threat to humanity until recently.
I think that the article misses the point. Many people are using ChatGPT for creation of relatively small but high quality datasets, because it is very easy. Stanford created an amazing dataset for their Alpaca for just $500.
If you are building a competitive model (such as Meta Llama), then you of course don't use ChatGPT-generated data, because you have the money to download the whole internet.
Seems to be a Multiplicity[1]-type problem.<p>[1] - <a href="https://en.wikipedia.org/wiki/Multiplicity_(film)" rel="nofollow">https://en.wikipedia.org/wiki/Multiplicity_(film)</a>
how does one do this ? train an opensource LLM on chatgpt ? people have been talking about it so im intrigued.<p>is there a how to anywhere - not even sure which opensource model to use, etc