It never ceases to amaze me what you can do with these transformer models. They created millions of potential solutions for each problem, used the provided examples for the problems to filter out 99% of incorrect solutions and then applied some more heuristics and the 10 available submissions to try to find a solution.<p>All these approaches just seem like brute-force approaches: Let's just throw our transformer on this problem and see if we can get anything useful out of this.<p>Whatever it is, you can't deny that these unsupervised models learn some semantic representations, but we have no clue at all what that actually is and how these model learn that. But I'm also very sceptical that you can actually get anywhere close to human (expert) capability in any sufficiently complex domain by using this approach.
I sometimes read these and wonder if I need to retrain. At my age, I’ll struggle to get a job at a similar level in a new industry.<p>And then I remember that the thing I bring to the table is the ability to turn domain knowledge into code.<p>Being able to do competitive coding challenges is impressive, but a very large segment of software engineering is about eliciting what the squishy humans in management actually want, putting it into code, and discovering as quickly as possible that it’s not what they really wanted after all.<p>It’s going to take a sufficiently long time for AI to take over management that I don’t think oldies like me need to worry too much.
This is extremely impressive, but I do think it’s worth noting that these two things were provided:<p>- a very well defined problem. (One of the things I like about competitive programming and the like is just getting to implement a clearly articulated problem, not something I experience on most days.)
- existing test data.<p>This is definitely a great accomplishment, but I think those two features of competitive programming are notably different than my experience of daily programming. I don’t mean to suggest these will always be limitations of this kind of technology, though.
This seems to have a narrower scope than GitHub Copilot. It generates more lines of code to a more holistic problem vs. GitHub Copilot that works as a "more advanced autocomplete" in code editors. Sure Copilot can synthesize full functions and classes but for me, it's the most useful when it suggests another test case's title or writes repetitive code like this.foo = foo; this.bar = bar etc...<p>Having used Copilot I can assure you that this technology won't replace you as a programmer but it will make your job easier by doing things that programmers don't like to do as much like writing tests and comments.
Relevant blogpost on codeforces.com (the competitive programming site used): <a href="https://codeforces.com/blog/entry/99566" rel="nofollow">https://codeforces.com/blog/entry/99566</a><p>Apparently the bot would have a rating of 1300. Although the elo rating between sites is not comparable, for some perspective, mark zuckerberg had a rating of ~1k when he was in college on topcoder: <a href="https://www.topcoder.com/members/mzuckerberg" rel="nofollow">https://www.topcoder.com/members/mzuckerberg</a>
I find almost every new advance in deep learning is accompanied by contrasting comments: it's either "AI will soon automate programming/<insert task here>", or "let me know when AI can actually do <some-difficult-task>". There are many views on this spectrum, but these two are sure to be present in every comment section.<p>IIUC, AlphaCode was trained on Github code to solve competitive programming challenges on Codeforces, some of which are "difficult for a human to do". Suppose AlphaCode was trained on Github code that contains the entire set of solutions on Codeforces, is it actually doing anything "difficult"? I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).<p>The general question I have been trying to understand is this: is the ML model doing something that we can <i>quantify</i> as "difficult to do (given this particular training set)"? I would like to compute a number that measures how difficult it is for a model to do task X given a large training set Y. If the X is part of the training set, the difficulty should be <i>zero</i>. If X is obtained only by combining elements in the training, maybe it is harder to do. My efforts to answer this question: <a href="https://arxiv.org/abs/2109.12075" rel="nofollow">https://arxiv.org/abs/2109.12075</a><p>In recent literature, the RETRO Transformer (<a href="https://arxiv.org/pdf/2112.04426.pdf" rel="nofollow">https://arxiv.org/pdf/2112.04426.pdf</a>) talks about "quantifying dataset leakage", which is related to what I mentioned in the above paragraph. If many training samples are also in the test set, what is the model actually learning?<p>Until deep learning methods provide a measurement of "difficulty", it will be difficult to gauge the prowess of any new model that appears on the scene.
The example problem (essentially, is T a subsequence of S with deletions of size N) is a classic problem with no doubt dozens of implementations in AlphaCode's training set.<p>And yet, what a garbage solution it produces.<p>To illustrate the difference between intelligence and regurgitation, someone tell me what CoPilot generates for this:<p><pre><code> // A Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.
</code></pre>
Here is a human solution:<p><pre><code> func swap(x int32) int32 {
const mask = 1 << 5
var (
xor1 = (x>>11 ^ x) & mask
xor2 = xor1 << 11
)
return x ^ xor1 ^ xor2
}
</code></pre>
CoPilot cannot reason numerically like this (understand "seventeenth bit" and "sixth bit" and generate the right code for that combination). It needs to understand the size of the gap between the bits, i.e., 11, and that's too hard.
At the risk of sounding relentlessly skeptical - surely by training the code on GitHub data you're not actually creating an AI to solve problems, but creating an extremely obfuscated database of coding puzzle solutions?
Between this and OpenAI's Github Copilot "programming" will slowly start dying probably. What I mean by that is that sure, you have to learn how to program, but our time will be spent much more on just the design part and writing detailed documentation/specs and then we just have one of these AIs generate the code.<p>It's the next step. Binary code < assembly < C < Python < AlphaCode<p>Historically its always been about abstracting and writing less code to do more.
I've been wondering this for a while:<p>In the future, code-writing AI could be tasked with generating the most reliable and/or optimized code to pass your unit tests. Human programmers will decide what we want the software to do, make sure that we find all the edge cases and define as many unit tests as possible, and let the AI write significant portions of the product. Not only that, but you could include benchmarks that pit AI against itself to improve runtime or memory performance. Programmers can spend more time thinking about what they want the final product to do, rather than getting mired in mundane details, and be guaranteed that portions of software will perform extremely well.<p>Is this a naive fantasy on my part, or actually possible?
How suprising did you guys find this? I'd have said there was a 20% chance of this performing at the median+level if I was asked to predict things beforehand.
This is kind of neat. I wonder if it will one day be possible for it to find programs that maintain invariant properties we state in proofs. This would allow us to feel confident that even though it's generating huge programs that do weird things a human might not think of... well that it's still <i>correct</i> for the stated properties we care about, ie: that it's not doing anything underhanded.
Calling it now: If current language models can solve competitive programming at an average human level, we’re only a decade or less off from competitive programming being as solved as Go or Chess.<p>Deepmind or openAI will do it.
If not them, it will be a Chinese research group on par with them.<p>I’ll be considering a new career. It will still be in computer science but it won’t be writing a lot of code.
There’ll be several new career paths made possible by this technology as greater worker productivity makes possible greater specialization.
It reminds me that median reputation on StackOverflow is 1.
All AlphaSO would have to do is to register to receive median reputation on SO ;) (kidding aside AlphaCode sounds like magic)<p>Inventing relational DBs hasn't replaced programmers, we just write custom DB engines less often. Inventing electronic spreadsheets hasn't deprecated programmers, it just means that we don't need programmers for corresponding tasks (where spreadsheets work well).<p>AI won't replace programmers until it grows to replace the humanity as a whole.
> AlphaCode placed at about the level of the median competitor,<p>In many programming contests, a large number of people can't solve the problem at all, and drop out without submitting anything. Frequently that means the median scoring solution is a blank file.<p>Therefore, without further information, this statement shouldn't be taken to be as impressive as it sounds.
> Creating solutions to unforeseen problems is second nature in human intelligence<p>If this is true then a lot of the people I know lack human intelligence...
I am always surprised by the amount of skepticism towards deep learning on HN. When I joined the field around 10 years ago, image classification was considered a grand challenge problem (e.g. <a href="https://xkcd.com/1425/" rel="nofollow">https://xkcd.com/1425/</a>). 5 years ago, only singularity enthusiast types were envisioning things like GPT-3 and Copilot in the short term.<p>I think many people are uncomfortable with the idea that their own "intelligent" behavior is not that different from pattern recognition.<p>I do not enjoy running deep learning experiments. Doing resource-hungry empirical work is not why I got into CS. But I still believe it is very powerful.
Seems to me that this accelerates the trend towards a more declarative style of programming where you tell the computer what you want to do, not how to do it
Do I understand it correctly that it generated (in the end) ten solutions that then were examined by humans and one picked? Still absolutely amazing though.
It would be interesting if a future 'AlphaZeroCode' with access to a compiler and debugger can learn to code, generating data using self-play. Haven't read the paper yet, seems some impressive milestone.
What I always find missing from these Deep Learning showcase examples are an honest comparison to existing work. It isn’t like computers haven’t been able to generate code before.<p>Maybe the novelty here is working from the English language specification, but I am dubious just how useful that really is. Specifications are themselves hard to write well too.<p>And what if the “specification” was some Lisp code testing a certain goal, is this any better then existing Genetic Programming?<p>Maybe it is better but in my mind it is kind of suspicious that no comparison is made.<p>I love Deep Learning but nobody does the field any favors by over promising and exaggerating results.
To me, coding in imperative languages are one of the hardest things to produce an AI for with current approaches (CNN’s, MCTS and various backpropagation). Something like Cyc would seem to be a lot more promising…<p>And yet, I am starting to see (with GitHub’s Copilot, and now this) a sort of “GPT-4 for code”. I do see many problems with this, including:<p>1. It doesn’t actually “invent” solutions on its own like AlphaZero, it just uses and remixes from a huge body of work that humans put together,<p>2. It isn’t really ever sure if it solved the problem, unless it can run against a well-defined test suite, because it could have subtle problems in both the test suite and the solution if it generated both<p>This is a bit like readyplayer.me trying to find the closest combination of noses and lips to match a photo (do you know any open source alternatives to that site btw?)<p>But this isn’t really “solving” anything in an imperative language.<p>Then again, perhaps human logic is just an approaching with operations using low-dimensional vectors, able to capture simple “explainable” models while the AI classifiers and adversarial training produces far bigger vectors that help model the “messiness” of the real world and also find simpler patterns as a side effect.<p>In this case, maybe our goal shouldn’t be to get solutions in the form of imperative language or logic, but rather unleash the computer on “fuzzy” inputs and outputs where things are “mostly correct 99.999% of the time”. The only areas where this could fail is when some intelligent adversarial network exploits weaknesses in that 0.001% and makes it more common. But for natural phenomena it should be good enough !
If you want some video explanation
<a href="https://youtu.be/Qr_PCqxznB0" rel="nofollow">https://youtu.be/Qr_PCqxznB0</a>
And this is how we reach the technological singularity and how programmers become as equivalently out-of-demand as piano tuners: self-programming systems.<p>AI will eat any and all knowledge work because there's very little special a human can do that a machine won't be able to do eventually, and much faster and better. It won't be tomorrow, but the sands are inevitably shifting this way.
It is obvious to me that computer programming is an interesting AI goal, but at the same time I wonder if I'm biased, because I'm a programmer. The authors of AlphaCode might be biased in this same way.<p>I guess this makes sense though, from a practical point of view. Verifying correctness would be difficult in other intellectual disciplines like physics and higher mathematics.
I am thinking whether this result can create a type of loop that can self-optimize.<p>We have AI to generate reasonable code from text problem description.<p>Now what if the problem description text is to generate such a system in the first place?<p>Would it be possible to close the loop, so to speak, so that over many iterations:<p>- text description is improved<p>- output code is improved<p>Would it be possible to create something that converges to something better?
I agree with most of the comments I've read in this thread. Writing code to solve a well defined narrowly scoped problem isn't that hard or valuable. It's determining what the problem actually is and how software could be used to solve it that is challenging and valuable.<p>I would really like to see more effort in the AI/ML code generation space being put into things like code review, and system observation. It seems significantly more useful to use these tools to augment human software engineers rather than trying to tackle the daunting and improbable task of completely replacing them.<p>*Note: as a human software engineer I am biased
I just hope that this shows how useless competitive programming is that it can be replace by the Transformer-model.<p>Additionally, people should REALLY rething their coding interviews if they can be solved by a program.
Hey, honest question: how does one get into competitive programming? I imagine it goes far beyond just leetcoding but honestly i don't even know where to start.
Most people here are programmers (or otherwise involved in the production of software). We shouldn't look at RPA and other job automation trends dispassionately. SaaS valuations aren't were they are (and accounting doesn't treat engineering salary as cost of goods sold) because investors believe that they will require armies of very well paid developers in perpetuity.
> In our preprint, we detail AlphaCode, which uses transformer-based language models to generate code at an unprecedented scale, and then smartly filters to a small set of promising programs<p>if you're using a large corpus of code chunks from working programs as symbols in your alphabet, i wonder how much entropy there actually is in the space of syntactically correct solution candidates.
This result is well worth a meme.<p><a href="https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/38800416672363094847602926489336820944788560867702800329357993734324516552705/" rel="nofollow">https://opensea.io/assets/0x495f947276749ce646f68ac8c2484200...</a>
I suspect these code generating AIs will bring the singularity at some point in the future. Even if we don’t manage to create an artificial general intelligence, they will. I imagine they will learn to code on super human levels through self play just like AlphaGo and AlphaZero did. This will be awesome.
Between developments like this (and Copilot [Is there a general accepted word for this class of things e.g. "AI Coders"?) and the move toward fully remote, I predict the mean software engineering salary in the United States will be lower in 10 years (in real dollars) than it is today.
Since they used the tests this is not something you can do if you don't have a rich battery of tests.<p>Perhaps many problems are something like finite automata and the program discover the structure of the finite automata and also an algorithm for better performance.
>> AlphaCode ranked within the top 54% in real-world programming competitions,
an advancement that demonstrates the potential of deep learning models for tasks
that require critical thinking.<p>Critical thinking? Oh, wow. That sounds amazing!<p>Let's read further on...<p>>> At evaluation time, we create a massive amount of C++ and Python programs for
each problem, orders of magnitude larger than previous work. Then we filter,
cluster, and rerank those solutions to a small set of 10 candidate programs that
we submit for external assessment.<p>Ah. That doesn't sound like "critical thinking", or any thinking. It sounds like
massive brute-force guessing.<p>A quick look at the arxiv preprint linked from the article reveals that the
"massive" amount of prorgams generated is in the millions (see Section 4.4).
These are "filtered" by testing them against program input-output (I/O) examples
given in the problem descriptions. This "filtering" still leaves a few thousands
of candidate programs that are further reduced by clustering to "only" 10 (which
are finally submitted).<p>So it's a generate-and-test approach rather than anything to do with reasoning
(as claimed elsewhere in the article) let alone "thinking". But why do such
massive numbers of programs need to be generated? And why are there still
thousands of candidate programs left after "filtering" on I/O examples?<p>The reason is that the generation step is constrained by the natural-language
problem descriptions, but those are not enough to generate appropriate
solutions because the generating language model doesn't understand what the
problem descriptions mean; so the system must generate millions of solutions
hoping to "get lucky". Most of those don't pass the I/O tests so they must be
discarded. But there are only very few I/O tests for each problem so there are
many programs that can pass them, and still not satisfy the problem spec. In the
end, clustering is needed to reduce the overwhelming number of pretty much
randomly generated programs to a small number. This is a method of generating
programs that's not much more precise than drawing numbers at random from a hat.<p>Inevitably, the results don't seem to be particularly accurate, hence the
evaluation against programs written by participants in coding competitions,
which is not any objective measure of program correctness. Table 10 on the arxiv
preprint lists results on a more formal benchmar, the APPS dataset, where it's
clear that the results are extremely poor (the best performing AlphaCode variant
solves 20% of the "introductory" level problems, though outperforming earlier
approaches).<p>Overall, pretty underwhelming and a bit surpirsing to see such lackluster
results from DeepMind.
The year is 2025, Google et al. are now conducting technical on-site interviews purely with AI tools and no human bias behind the camera (aside from GPT-3's quirky emotions). The interview starts with a LC hard, you're given 20 minutes -- good luck!
I think CoPilot, etc will be revolutionary tools AND I think human coders are needed. Specifically I love CoPilot for the task of "well specified algorithm to solve problem with well-defined inputs and outputs". The kind of problem you could describe as a coding challenge.<p>BUT, our jobs have a lot more complexity<p>- Local constraints - We almost always work in a large, complex existing code base with specific constraints<p>- Correctness is hard - writing lots of code is usually not the hard part, it's proving it correct against amorphous requirements, communicated in a variety of human social contexts, and bookmarked.<p>- Precision is extremely important - Even if 99% of the time, CoPilot can spit out a correct solution, the 1% of the time it doesn't creates a bevy of problems<p>Are those insurmountable problems? We'll see I suppose, but we begin to verge on general AI if we can gather and understand half a dozen modalities of social context to build a correct solution.<p>Not to mention much of the skill needed in our jobs has much more to do with soft skills, and the bridge between the technical and the non technical, and less to do with hardcore heads-down coding.<p>Exciting times!
I think it would be interesting the train a system end-to-end with assembly code instead of various programming languages. This would make it a much more generic compiler
I am a little bitter that it is trained on stuff that I gave away for free and will be used by a billion dollar company to make more money. I contributed the majority of that code before it was even owned by Microsoft.