GPT-3 can run code

271 点作者 maytc大约 3 年前

31 条评论

> GPT-3 struggles with large numbers, decimal numbers, and negative numbers. When used it returns answers that are close but often incorrect.Regarding GPT-3's "guesstimates," intuitively it feels like the network has to guess because it hasn't been given a way to do exact computation--a neural network is built out of nonlinear functions--even if it "understands" the prompt (for whatever value you want to give to "understand").Are there any techniques that involve giving the model access to an oracle and allowing it to control it? To continue the analogy, this would be the equivalent of giving GPT-3 a desk calculator.If this is a thing, I have other questions. How do you train against it? Would the oracle have to be differentiable? (There are multiple ways to operate a desk calculator to evaluate the same expression.) Also, what control interface would the model need so that it can learn to use the oracle? (Would GPT-3 emit a sequence of 1-hot vectors that represent functions to do, and would the calculator have "registers" that can be fed directly from the input text? Some way of indirectly referring to operands so the model doesn't have to lossily handle them.)

评论 #30848373 未加载

评论 #30845037 未加载

评论 #30852342 未加载

评论 #30853142 未加载

评论 #30851361 未加载

评论 #30848896 未加载

评论 #30851316 未加载

评论 #30852174 未加载

Veedrac大约 3 年前

> GPT-3 seems to have issues with large numbers. Moyix’s gist covers this in detail. GPT-3 tends to guesstimate an algebraic function instead of evaluating the numbers, so the answer is only correct to a certain approximation.There are two issues here. One is the lack of working memory, which means that there is very little scratch space for calculating things with a meaningful sequential depth. GPT-3 is very unlike traditional evaluation methods in this regard, in that it is easier for it to interpret the meaning of a program you give it and then intuit the result given the context than it is to mechanically execute its steps.The other issue is the text encoding, which makes it much harder for GPT-3 to do digit-by-digit operations. Many arbitrary numbers are just their own token. A fixed length number to us looks like a fixed number of characters, but for GPT-3 they can be and almost arbitrary number of tokens divided into almost arbitrary chunks. Using thousands separators is very helpful for it.If you account for these and design a prompt that mitigates them you can get much stronger results. Here is an example: <a href="https://news.ycombinator.com/item?id=30299360#30309302" rel="nofollow">https://news.ycombinator.com/item?id=30299360#30309302</a>. I managed an accuracy of 42% for 3-by-3 digit multiplication.

评论 #30850365 未加载

mbowcut2大约 3 年前

So, for people unfamiliar with deep language models like GPT, it's essentially a program that takes in a prompt and predicts the next set of words based on a training corpus -- which in GPT-3's case is a large portion of the internet. In these examples GPT is not executing any python code, it has just been trained on enough Python code/output to successfully predict what kinds of outputs these functions would produce.

评论 #30851178 未加载

kevincox大约 3 年前

I find it quite interesting that in the JSON to YAML example it reordered the list. If this was an access control list that could be a serious security issue that could have easily been missed in review. (Especially if dozens of files like this were changed at once). Of course a malicious user could have done this as well and likely got by code review but the fact that it was accidental is scarier in a way.

kcorbitt大约 3 年前

For folks wanting to play around with the GPT-3 code-editing capabilities referenced in the article within your own codebase, I wrote a simple open source VS Code plugin that lets you run commands against your currently-open file and get GPT-3's suggested edits back in a diff: <a href="https://marketplace.visualstudio.com/items?itemName=clippy-ai.clippy-ai" rel="nofollow">https://marketplace.visualstudio.com/items?itemName=clippy-a...</a>

评论 #30848511 未加载

timdellinger大约 3 年前

I assume that GPT-3 is just exhibiting rote memory. For small numbers, it has accurate answers memorized from the training set, but for larger numbers, it just "remembers" whatever is close... hence the ability to estimate.My take is not that GPT-3 can run code, but rather that GPT-3 has memorized what code looks like and what the output looks like.

ogogmad大约 3 年前

Theorem: There are infinitely many prime numbers.Proof:[GPT-3's "proof"]Let p be a prime number. Then by definition, p is a positive integer that has no positive divisors other than 1 and p.Now consider the integer q = p + 1. We will show that q is not a prime number, which will prove that there are infinitely many prime numbers.Since p is a positive divisor of q, q cannot be prime.[...another attempt with the same prompt]We know that there are infinitely many natural numbers. Let's suppose that there are only finitely many prime numbers. Then we can list them all:p_1, p_2, p_3, ..., p_nNow, consider the numberN = p_1 p_2 p_3 ... p_n + 1This number is greater than any of the prime numbers in our list, so it cannot be divisible by any of them. But it is also a natural number, so it must be prime. This contradicts our assumption that there are only finitely many prime numbers, so we must conclude that there are infinitely many prime numbers.~~Personal remark: I had to give it the same prompt like 5 times to get a correct-ish proof. The first attempts came out like the first example.~~ The proof is still wrong.

评论 #30845912 未加载

评论 #30845889 未加载

评论 #30847061 未加载

评论 #30847831 未加载

评论 #30845888 未加载

评论 #30845681 未加载

评论 #30849859 未加载

评论 #30848846 未加载

lopatin大约 3 年前

Can someone explain for a dummy how this is possible? How does it know that range() is zero indexed? Was it specifically trained on Python input/function/output data? Or did it just "learn" it? Do the researchers know how it learned it?Does it actually "run" the code? Like, if it was looping over 1 billion iterations would it take 1B times longer than if it was just one iteration? I have so many questions.

评论 #30847790 未加载

评论 #30846624 未加载

评论 #30846612 未加载

评论 #30848207 未加载

graiz大约 3 年前

GPT3 is a really impressive auto-complete. It takes inputs and predicts what text should be output. It's super impressive and it looks like it's smart but it is not running code, it's not Turing complete and if you understand how it works it's very easy to cause it to produce significant errors.

kaetemi大约 3 年前

It has a ton of programming books in its training data. It only "runs" anything that's close enough to any samples it has seen that included output. Anything complex, and it fails, because it does not reason about it logically. It's bad at the same things humans are bad at.

评论 #30849234 未加载

berryg大约 3 年前

I struggle to understand how GPT-3 executes code. Is it simply running a python (or any other language) interpreter? Or is GPT-3 itself interpreting and executing python code? If the latter question is true that would be amazing.

评论 #30850183 未加载

评论 #30847486 未加载

评论 #30847505 未加载

bitwize大约 3 年前

GPT-3 is starting to remind me of SCP-914. Give it an input, and its millions of tiny wheels churn and it produces something like what you want, but otherwise quite unexpected.Let's hope it doesn't turn into something like SCP-079...

csmeder大约 3 年前

What year will GTP be able to take an app written in Swift/SwiftUI and output a spectacular Android translation? 3-years? 5-years? 10-years?This is an interesting benchmark because it is a very difficult problem, however: GTP has both everything it needs to do this without needing a fundamental improvement to the core of GTP (this process is more of a science than art) and using automated UI testing GTP can check if its solution worked.Thus this challenge is in the realm of what GTP already is, however, once it can do this it will have massive implications for how software is built.

评论 #30848499 未加载

daenz大约 3 年前

Nit, but YAML is a superset of JSON, so no conversion required :)

评论 #30845906 未加载

spupe大约 3 年前

This is fascinating. I feel that we are still in the infancy of the field, however. These observations are analogous to naturalists of the past describing an animal's behavior, but we need to get to the point where more accurate estimates are made (ie, how often does it do each thing, how accurate it is after 100+ tries, etc). Every day we see a new observation showing wha GPTs can do, we also need a good way to make these observations systematic.

PaulHoule大约 3 年前

It would be remarkable if it got the right answers.But it can't because it doesn't have the right structure (e.g. GPT-3 finishes in a finite time, a program in a real programming doesn't necessarily!)GPT-3's greatest accomplishment is that it has "neurotypical privilege", that is if it gets an answer that is 25% or 95% correct people give it credit for the whole thing. People see a spark of intelligence in it the way that people see faces in leaf axels or in martian rock formations or how G.W. Bush looked in Vladimir Putin's eyes and said he got a sense of Putin's soul. (That was about the only thing in his presidency that he later said he regretted!)As an awkward person I am envious because sometimes it seems I get an answer 98% correct or 99.8% correct and get no credit at all.

评论 #30847890 未加载

a-dub大约 3 年前

is there a search engine for the training data so that one can verify that it is actually performing novel operations and not just quoting back stuff from its incredibly large training set?

ivegotnoaccount大约 3 年前

> For example, it seems to understand how to find a sum, mean, median, and mode. > Input: 1, 4, 5, 6, 2, 1, 1 > Output: 2.28571428571Well, even with those small numbers, it's wrong. The first "2" after the dot should not be there. The result it gives is 16/7, not 20/7.

评论 #30847681 未加载

aplanas大约 3 年前

Seems that it can convert from Python to Perl:<a href="https://beta.openai.com/playground/p/o4qZWSXVz8JMmVaI9j9NMIKM?model=text-davinci-edit-001" rel="nofollow">https://beta.openai.com/playground/p/o4qZWSXVz8JMmVaI9j9NMIK...</a>

评论 #30850712 未加载

learndeeply大约 3 年前

Anyone have any ideas on how they're doing text insertion using an auto-regressive model?

评论 #30846674 未加载

zora_goron大约 3 年前

A quick question for anyone familiar with the architecture of these Transformer-based models -- I've heard that one reason why they don't work well with numbers is how the inputs are tokenized (i.e. as "chunks" rather than individual words/numbers). Is there anything architecturally preventing an exception in this form of tokenizing in the data preprocessing step, and passing numbers into the model in the format of 1 digit == 1 token? It seems like such a change could possibly result in a better semantic "understanding" of digits by the model.

评论 #30847460 未加载

评论 #30847536 未加载

imranq大约 3 年前

An interesting research direction would be to see how much the GPT3 deviates as we get more precise on various computational tasks. Possibly this would give some measure of some of the concepts the model has learned

评论 #30845578 未加载

charcircuit大约 3 年前

>Is GPT-3 Turing complete? Maybe.It's obviously not. To handle infinite loops it needs to solve the halting problem. Which is not possible.

评论 #30848534 未加载

luxurytent大约 3 年前

Similar to how my four year old can read books, because he’s memorized the words I’ve read to him through repeated story times.

algon33大约 3 年前

If I remember rightly, the AlphaCode paper includes a list of benchmarks, including the results of a finetuned GPT-3 for coding. I think they did it because Codex wasn't available to them when were doing their tests, but I might be wrong there.

评论 #30850108 未加载

Avalaxy大约 3 年前

Just because you can, doesn't mean that you should. For some things it's just better to use a rules-based engine that is always correct, rather than a heuristics based algorithm that gives answers that are merely close.

评论 #30845986 未加载

mountainriver大约 3 年前

This is such an interesting field but I think there needs to be more focus on determinism and correctness. The stuff that’s happening with retrieval transformers is likely where this is heading

7373737373大约 3 年前

Has anyone tried using it for SAT problems yet?

评论 #30848510 未加载

DC-3大约 3 年前

Very far from an expert on ML, but isn't GPT-3 trivially not Turing Complete since it halts deterministically?

评论 #30848315 未加载

inopinatus大约 3 年前

Even a stopped clock tells the right time, twice a day.

unixhero大约 3 年前

Great so how do I run GPT-3 on my own hardware at home?

评论 #30849415 未加载