It writes uncomfortably coherent and witty fatalist My Little Pony fanfiction:<p>> “I’m Princess Celestia, of Equestria.” the alicorn said, “It’s nice to meet you. And I’m very sorry for your untimely demise.”<p>> “Heh. You’re telling me.” the Professor laughed, shaking his head. “I’m dead. I’m… dead. Do you know what that means?”<p>> “I have an inkling.” Princess Celestia said, wincing.<p>> “It means I don’t have to worry about even numbers anymore.” he said, nodding. “I don’t have to worry about probabilities anymore! I don’t have to worry about the upcoming war anymore! I don’t have to worry about anypony anymore!” he spread his hooves out at his sides, “I don’t have to do anything anymore! It’s beautiful!”<p>> “Yes, yes, I’m sure it is.” Princess Celestia said, nodding along.<p>> “Nothing matters now that I’m dead!” he cheered, “It’s great! I’m free! This is what it feels like to be liberated!”<p>> “I can see that.” she said.<p>> “I’m free from death!” he continued, “I’m free from taxes! I’m free from dictatorships! I’m free from flirting!” he laughed, “I’m free from everything! Isn’t it great?”<p>> “Sure, I guess.” Princess Celestia shrugged.<p>> “What do you think about all of this, Princess?” he asked.<p>> “About what?” she replied.<p>> “Well, I just thought you might have something to say about it all.” he told her.<p>> “What would you like me to say?” she responded.<p>> “Do you think death is awesome?” he questioned.<p>> “Well… no.” she shook her head, “But I think the passage to it is.”<p>> “Oh.” he said, “I see what you mean.”
Fine tuning GPT-3 is one of the biggest challenges, because it's behind an API. The weights aren't available to researchers, so we can't make it do anything it doesn't already do.<p>But, that's fair. It's OpenAI's weights; they can keep them locked up if they want to. What caught my attention, though, is that supposedly OpenAI is working on a way to support fine-tuning.<p>If you think about the logistics of that, it's a very interesting challenge. The situation is this: 240GB of weights, as a webservice. Each fine-tuning session results in another copy of 240GB. So it clearly doesn't scale -- 1TB per 4 users isn't exactly efficient.<p>Except, not quite. You can solve this by adding additional layers, which you then fine-tune. So the base model is 240GB or whatever, and the extra layers morph the output to do what you want. Think of it as a GPT-3 with a GPT-2 1.5B stuck on the end of it.<p>It's a neat idea, because theoretically you'd get two models out of it: you can "break off" the end of the fine-tuned model, and you end up with the original model. So it would be very modular.<p>Are there other models that you can "break apart" to get different sub-models? Sort of like adding slots that give a model different capabilities.
Carmack posted (yesterday) an interesting thought on models like GPT-3:<p><i>"Big AI models like GPT-3 train on massive internet text dumps, but the data is assumed to be independent and identically distributed. Incorporating time information for a decade of data might allow them to start writing tomorrow's reddit or twitter trends."</i><p><a href="https://twitter.com/ID_AA_Carmack/status/1278840413919551488" rel="nofollow">https://twitter.com/ID_AA_Carmack/status/1278840413919551488</a>
Straight up plagiarizes The Beatles here: <a href="https://www.gwern.net/GPT-3#dr.-seuss-oh-the-places-youll-go" rel="nofollow">https://www.gwern.net/GPT-3#dr.-seuss-oh-the-places-youll-go</a><p>"
There’s nothing you can know that isn’t known.
Nothing you can see that isn’t shown.
Nowhere you can be that isn’t where you’re meant to be.
"
After spending a lot of time working with GPT-3/the OpenAI API (<a href="https://github.com/minimaxir/gpt-3-experiments" rel="nofollow">https://github.com/minimaxir/gpt-3-experiments</a> ), one notable part of GPT-3 is the high signal-to-noise ratio in generated output.<p>When finetuning GPT-2, only about 5-10% of the generated output is usable/coherent. But with GPT-3, easily <i>30%-40%</i> of the generated text is usable/coherent, which is a big boost in quality.
GPT's take on the navy seal copypasta, in the style of a KGB spy:<p>"I have over 300 confirmed red scares."<p>Haha, that is genuinely one of the funniest versions of that I've ever seen, human-generated or otherwise. That level of inference is really amazing.
I'm gonna put forward the very view that gwern repeatedly argues against: "but... it's not <i>understanding</i>."<p>So far I see no evidence that this thing or anything else like it has any actual understanding, any model of the world. Indeed it can't as it possesses no sensory apparatus. It's not embodied. It doesn't experience anything.<p>I'm not sure the OpenAI folks would argue with me, but it seems Gwern asserts that this sort of thing indicates that general AI or even sentient AI is on the doorstep. I don't think it does, and I still maintain as I always have that CS people systematically underestimate and trivialize biology.
First off, gwern, lovely blog. The table of contents is incredibly helpful, especially with those little pop-up previews.<p>I would've loved to have GPT-3 available to me two weeks ago. I was building a personal escape room for my wife as a gift, and used huggingface's GPT-2 website to help write some of the world building content. I'm not a particularly good writer, let alone creative, but wanted a few journal pages/notes to build the atmosphere and story of the escape room. I was able to write the rough skeleton of those notes and then use GPT-2 to help fill them out. Ended up working okay, definitely better than nothing, but GPT-2 is temperamental and lacks the "prompting" that GPT-3 has.<p>For example, I needed to come up with the name of the journal's author. So I fed the journal text to GPT-2 and put "Sincerely," at the bottom, to try and prompt it to complete a name. That didn't work. Ultimately what worked was putting "My name is" at the end. I still had to grind through 20 or so completions before I got a name I liked.<p>(Yes, I could have just picked a name at random. Did I mention I'm bad at creativity? My thinking was that the AI could more intelligently pick a name that fit the story and writing style of the journal. And honestly the name it came up with up with, Mabel, fit the character well (a librarian dabbling in magic).<p>I feel like GPT-3 would have done a lot better. Not to mention the ability to describe my world to it and then just straight up ask it for ideas.
Great article. Well worth the read!<p>I enjoyed the part about sampling which is a big unsolved problem. To me, techniques like nucleus sampling and temperature sampling feels like hacks to make up for the fact that maximizing for likelihood maybe isn't the goal!? Maybe repetitive gibberish has a higher likelihood than prose written by humans? That Best of sampling decreased text quality indicates that it has. Researches have assumed that the problem would go away with ever growing models. But maybe it won't?<p>I don't agree that generating (symbolic) music would be less sensitive to sampling issues. On the contrary, in my opinion. In text you can often get away with grammatical errors or missing punctuation. But if the pitch or timing of one chord is wrong it's over. The audience instantly hears that it is garbage. Thus, you have to lower the temperature (or probability threshold or what have you) to make the sampling more conservative exacerbating the problem with repeated sequences.<p>Of course, in music you <i>want</i> repetitions. But not too much. The magic number (in Western music) is 4. Fewer repeats makes it feel as if the music jumps around. More repeats makes it feel as if the music is stuck or "looping."
Could someone with GPT-3 beta access try whether it can better solve 3 digit addition when it is allowed/encouraged/forced to make intermediate results explicit? E.g. instead of<p>21 + 110 = 130<p>150 + 12 =<p>condition it on<p>21 + 110 = 100 + 10 + 20 + 1 = 100 + 30 + 1 = 131<p>150 + 12 =<p>or similar. Given that humans make these intermediate steps in their heads GPT may perform better when it is encouraged to do them as well.<p>This may in fact apply to all sorts of reasoning, but in many cases it may be difficult to make these steps explicit in text form. Humans seem to mainly use some prediction layer or scratchpad which also contains the inner monologue but also motor primitives, smells, images, everything. Humans can decide to think a bit longer before producing an output, which appears to require an RNN.
This article is fantastic in both shape and content, and I got lost with all the examples because there is so much to wonder at.<p>What hits me most profoundly is that there are so many witty and interesting prompts yet the purely logical statements fall apart (with the black ravens, or male sister).<p>This is something that probably does not jump to one's mind as significant, because "technicalities", but to me this is where logic allows us to take a step back from our human projection for a second, it's my own anthropomorphism that becomes more obvious to me. I find that ironic, considering a lot of human beings also DO fail such tests, but something hits me about how a supposedly completely logical entity fails at logic more than at poetry. It kinds of shakes my own (supposed) humanity.
Holy crapoly.<p>The quality (in both senses) of the output given an appropriately constructed prompt is incredible.<p>I wonder if it's possible to get it to do the opposite of summarizing, ie. give it a plot summary and have it expand it into a fleshed out story that conforms to the summary...
This is fascinating to read through. It’s so hard to avoid a variant of the Forer effect, though - where we unconsciously discount the errors, and selectively focus on subsets of the output and impute meaning to them.<p>Designing objective quality tests must be an active area of research, I wonder what the best approaches are?
I still find it odd that we call this "artificial intelligence" when it's advanced mimicry at best. There's no "intelligence" in the strict definition of the word, it's just elaborate pattern matching.<p>But I get it, it's exciting, and it's an easy way to get VC money. Perhaps one day we'll get something useful aside from the various pattern matching applications (image recognition, speech to text, etc). I'm skeptical but willing to be surprised.