GPT-3 vs. 3M free-text trivia responses

103 点作者 cowllin大约 4 年前

12 条评论

fpgaminer大约 4 年前

Somewhat related, but I can't stop thinking about what's going to happen when GPT-4 reads articles like this.Articles like these that discuss GPT-3's output, or comments and articles directly generated by GPT-3, are going to be fed into GPT-4 as part of a theoretical WebText3 dataset. Will that help or hurt GPT-4's learning? What effect will it have?Theoretically you can feed GPT-3's responses back into itself and ask it to introspect about them; whether it was right or wrong and to comment on why it gave the response it did. But I doubt GPT-3 is particularly good at self introspection. GPT-3 was trained before GPT-3 existed, obviously. So it was never trained on articles analyzing the output of an AI of its caliber.But GPT-4 is going to be trained on a corpus filled with people analyzing GPT-3's outputs, like this article. We would expect GPT-4 to be able to write an article like this. So it should be theoretically possible to give GPT-4 its own output, and then ask it to provide introspection, and for that introspection to be insightful.EDIT: Follow up thought. It's almost as if the internet is being filled with a training corpus on GPT-3's failings. Every fact that GPT-3 failed to learn from WebText2 is now going to be repeated, alongside the correct answer, in WebText3. Humans are globally working, unknowingly, to build a curated dataset by which GPT-4 can learn from GPT-3's mistakes.

评论 #26442537 未加载

评论 #26445453 未加载

评论 #26445448 未加载

评论 #26444927 未加载

评论 #26442524 未加载

评论 #26445749 未加载

hakuseki大约 4 年前

> The more important takeaway: dozens of technologists at IBM spent more than three years and untold millions of dollars building the program specifically trained for Jeopardy! prowess. Less than 10 years later, a general-purpose open-sourced technology without the massive mainframe or cooling fans can compete on the same level.This statement from the article confused me. GPT-3 is general-purpose, but not open source, nor does it run without massive hardware, nor was it developed without spending millions of dollars.

评论 #26448910 未加载

评论 #26446659 未加载

darepublic大约 4 年前

How is GPT3 considered publicly available? I applied months ago and have received no response.

评论 #26442950 未加载

评论 #26443315 未加载

gwern大约 4 年前

"The robot was best at Fine Arts and Current Events, worst at Word Play and Social Studies. ...This one’s not so surprising. We have a type of question called a “Two’fer Goofer” which asks for a pair of rhyming words that satisfy a given clue. It’s similar to the Rhyme Time category in Jeopardy or the old newspaper puzzle Wordy Gurdy. We had three of these questions in the showdown and GPT-3 missed all three of them. For Word Play questions that were more like vocabulary quizzes, GPT-3 performed admirably:"More evidence, if it was needed, that the use of BPEs sabotages GPT-3 in a lot of subtle ways (<a href="https://www.gwern.net/GPT-3#bpes" rel="nofollow">https://www.gwern.net/GPT-3#bpes</a>). GPT-3 can understand things like vocab definitions which do not depend on the internal spelling of a word or phonetics (which are erased by the BPE encoding of the data it was trained on), but as soon as you have to do things like puns... The BPEs are deadly. Ah well. Eventually OA or someone will train a proper character-level model, and then I'll be able to generate rhyming poetry without hacks like rhyming dictionaries.

johbjo大约 4 年前

> Less than 10 years later, a general-purpose open-sourced technology without the massive mainframe or cooling fans can compete on the same level.First, are GPT-3 models generally available?Secondly, GPT-3 is many orders of magnitude larger than any model existing 10 years ago. There are certainly cooling fans involved.

agravier大约 4 年前

> 2. Clues confuse GPT-3.They should probably have been removed. This gives me the overall impression that the testers treat GPT-3 a bit too much as something like an artificial human, and not enough like an algorithm (which will work better with sanitized input). This is not a major criticism, the experiment is still interesting.Could it be that the marketing from OpenAI it to blame? From the OpenAI front page:> Discovering and enacting the path to safe artificial general intelligence.> Our first-of-its-kind API can be applied to any language task, and currently serves millions of production requests each day.Does that seem misleading?

评论 #26441913 未加载

评论 #26441885 未加载

blueblisters大约 4 年前

Could GPT-3 lower barriers of entry for search engines? Assuming, it could update its "index" (weights) at regular intervals, it's not too far-fetched to see it competing with Google with high-quality relevant answers to queries.

评论 #26444379 未加载

评论 #26444429 未加载

kashyapc大约 4 年前

There was recently a thread[1] about an open-source alternative to GPT-3 here, called "GPT-Neo".I hope it takes off. But the language on their website undermines some confidence: "GPT-Neo is the code name for a series of transformer-based language models [...] we plan to train and open source."PS: The original URL that was submitted to HN is now 404; it moved to here[2].[1] <a href="https://news.ycombinator.com/item?id=25819803" rel="nofollow">https://news.ycombinator.com/item?id=25819803</a>[2] <a href="https://www.eleuther.ai/projects/gpt-neo/" rel="nofollow">https://www.eleuther.ai/projects/gpt-neo/</a>

vessenes大约 4 年前

It would be helpful to see the prompt structure used for gpt-3: state of the art prompting methods do significantly better than naive ones, and randomness and other settings can make a big difference in quality as well.There are also cyclical prompt methods which can help derandomize gpt3, for example, you could generate three to five answers per question then feed the set back to gpt-3 and ask it which is the most correct; the first round would use high randomness settings the last very low.My own experience is that you sort of have to work with gpt3 to get on the same page sometimes, and when you do the results can be remarkable.Anyway, fun idea!

dumbsqr大约 4 年前

I think than human communication is better than any ai model because human goes from general to concrete, adding more information to convert general context into insightful ideas. We use language to communicate what is important for us. Ai is about navigating in an ocean trying not to be captured by concrete things, because it knows nothing about being real.As a mathematical methafor, each word is a window in the tangent space of an n-dimensional sphere and the normal points to the center of the sphere, as time goes on communication is about shrinking the radious of the sphere so that the center is like the kernel of what we wish to communicate. The sphere in an ai system has a very big radius and is not able to shrink because it lacks a kernel.You could simulate a human conversation with an ai, for example to sell something following steps like description of a product, price advantages and so on, but in a real context those are ad hoc methods and there is no general rule to shrink that is to penetrate deep into the meaning of what you want to communicate.

wyldfire大约 4 年前

"Two'fer Goofer", "Tough Training" - why would GPT-3 give the question back as its response in these cases?

评论 #26442211 未加载

minimaxir大约 4 年前

> Credit where due: friend of WCT Dennis wrote the script to feed Water Cooler Trivia questions to GPT-3 and access the OpenAI API.It should be noted it's against OpenAI's rules to share access to GPT-3, although they've been inconsistent about enforcing it.

评论 #26441132 未加载