I had trouble accessing the relevant video snippet even after going through the conference registration, so here's a summary.<p>You can view the demo at <a href="https://twitter.com/i/broadcasts/1OyKAYWPRrWKb" rel="nofollow">https://twitter.com/i/broadcasts/1OyKAYWPRrWKb</a> starting around 29:00.<p>It's Sam Altman demoing a massive Open AI model that was trained on GitHub OSS repos using a Microsoft supercomputer. It's not Intellicode, but the host says that they're working on compressing the models to a size that could be feasible in Intellicode. The code model uses English-language comments, or simply function signatures, to generate entire functions. Pretty cool.
So that's basically program synthesis from natural language (ish)
specifications (i.e. the comments).<p>I can see this being a useful tool [1]. However, I don't expect any ability
for innovation. At best this is like having an exceptionally smart
autocomplete function that can look up code snippets on SO for you (provided
those code snippets are no longer than one line).<p>That's not to say that it can't write <i>new</i> code, that nobody has quite
written before in the same way. But in order for a tool like this to be useful
it must stick as close as possible to what is expected- or it will slow
development down rather than helping it. Which means it can only do what has
already been done before.<p>For instance- don't expect this to come up with a new sorting algorithm, out
of the blue, or to be able to write good code to solve a certain problem when
the majority of code solving that problem on github happens to be pretty bad.<p>In other words: everyone can relax. This will not take your job. Or mine.<p>____________<p>[1] I apologise to the people who know me and who will now be falling off
their chairs. OK down there?
I mean it is cool.<p>But there is the thing, the natural description of a function is not always this unambiguous.<p>When you are telling a function to 'compute XYZ', what you are actually doing is 'check whether X.a exists, if so execute branch 1), else branch 2)'.<p>If the logic gets really complicated, then describing it accurately in human language isn't necessarily faster than doing it in code directly. Otherwise, we don't need invent programming languages like at all, we can just write compilers to interpret and execute human languages.<p>And I am interested, as whether the model itself is conditioned on the type constraint of class. It is neat that they pick Python in this case. But if it is Java or other static typed language, would this system condition its generation not only the natural text, but also the resulted type system? My bet, per my understanding of the language modeling approach they use is, they are not doing this, due to very high complexity and cost of the training, and domain adaptation.<p>Overall, this again is an interesting demo. But I think for code generation based on human language to be useful, we are really in a scenario, that you need to go 99% accurate for it to be remotely practical.
How does this do compared to other models? Is this a totally cutting edge result? On the surface, it seems quite impressive, but sans an environment to try it out with, I cannot be entirely sure. Still, this does make me question whether I chose a safe career, haha.<p>The thing is, I'd really need to see a live demo to see how good this is. Making mistakes is actually kind of a big issue; as most people know, debugging code is harder than writing it. And a lot of the language models which can write impressive-seeming text also generate masses of garbage. There's no way to know whether this was cherrypicked or not.<p>The mere fact that it can extract meaning from text like this is already really impressive though.
I have thought about this before but I can see that logical errors are introduced which must be manually tested and reviewed anyway, so what if a more reliable approach could be achieved by training these data sets on test cases alongside passing code?<p>This way developers just write unit tests or functional tests, and the AI generates code and retrains itself until the code passes for all tests. This could happen silently in the background as the developer defines the tests.<p>A number of natural language test frameworks exist, Behat for example lets you define tests such as:<p>Feature: Multiple site support<p><pre><code> Background:
Given a global administrator named "Greg"
And a blog named "Greg's anti-tax rants"
And a customer named "Wilson"
And a blog named "Expensive Therapy" owned by "Wilson"
Scenario: Wilson posts to his own blog
Given I am logged in as Wilson
When I try to post to "Expensive Therapy"
Then I should see "Your article was published."
Scenario: Greg posts to a client's blog
Given I am logged in as Greg
When I try to post to "Expensive Therapy"
Then I should see "Your article was published."
</code></pre>
It could still fit the dream of describing to a computer what kind of program you want and having it figure out the plumbing.<p>Anyway interesting work. Very interesting. I remember a few colleagues laughed at me no more than 5 years ago when I suggested that AI would eventually write code. And here it is, in an early version, flawed surely but only set to improve.<p>Edit to add: This subject while insanely interesting to me is well out of my wheelhouse. I'm guessing there's possibly semantic structure to the above that the type of model being used in the demo can't deal with? Like this one use-case has to co-exist in an entire ecosystem of dependencies and related entities... Could the model cope with that or is it just calculating the likelihood of the next character like other models I've seen, but with insane accuracy when it comes to code?
I'am a bit confused, is this built by OpenAI or Microsoft?
Microsoft released the paper IntelliCode Compose: Code Generation Using Transformer [1] 4 days ago and there is no attribution to anyone from OpenAI in it.<p>Are those two entirely separate and yet exactly similar initiatives?<p>[1]: <a href="https://arxiv.org/abs/2005.08025v1" rel="nofollow">https://arxiv.org/abs/2005.08025v1</a>
Wow, this has the ability to be a total gamechanger. You have to be really observant about the bugs though, I would have totally missed the one with the price discount without executing it.
I worked on project very much like this last summer, a transformer language model applied to code completion.<p>You'd be surprised how easy it is to get a model that performs as well as what you see in the video. And it's even easier now that people have built great libraries for fine-tuning generative language models.<p>I encourage you to try it yourself! There are many interesting extensions for people to explore:<p>- Use bi-directional context (vanilla GPT-2 only sees backward context)<p>- Integrate with semantic analysis tools.<p>- Experiment with different context representations. You condition the model on an arbitrary sequence of N tokens. It's not necessarily the case that you should spend that whole budget on the N tokens that came immediately before. What about including the imports at the top of the file? What about the docstrings for functions that were just used? What about the filepath of the current file?<p>Don't look at something like this as though watching your job be automated away. Look at it as a tool that you can master and use to move up the stack.
Amazing!<p>So the developer's role will shift to:<p>1) writing good enough descriptions of the code to be generated by the AI model<p>2) fixing any little issues in the generated code
This is really cool. However, I doubt it can write more than very simple functions. That may be enough to be useful however. It would be nice if they created a demo page where we could try this out. This use case is a little different than the auto-complete one.
I wonder if this could be trained on just bug fix commits from GitHub in order to produce a model that could suggest bug fixes for an existing code base.
Can this freaky A.I. also generate the corresponding unit tests?<p>Or, for TDD, generate the unit tests <i>first</i> based on the function name and description. Then, if the dev updates any of those tests, or adds more tests, use that information in auto generating the appropriate code.
I don't see it replacing (or even much augmenting) professional programming any time soon... My predicted use case for this is mostly with non-programmers. They'll be instructed to write in English what they want to be done, and behind the scenes this will attempt to generate code, execute it, and give the results. A fun demo would be writing "Download the recipe on this webpage (paste link) and order the ingredients from Safeway". If it could generate its own billing and shipping storage to remember indefinitely after getting it from the user, then generate the relevant web scraping / web driving or API code for various websites, that'd be pretty sweet.
Where this would be most useful is automated testing suites just by specifying what you are testing for. A product manager looking to test portions of a system that absolutely need to work can specify code comments and generate 1000s of tests this way.<p>This is a gamechanger for ensuring the reliability of software. Many more people can be involved in the software development process, and inject their domain knowledge into it.<p>Are there any plans to open source the model? I would love to play around with it.
Glad to see it learned to use spaces instead of tabs.<p>In all seriousness, the demo really looks amazing. I'm curious to see more elaborate, real world examples though.
Imagine all the Stackoverflow accepted answers funneled into your code just because the answers were repeatedly used multiple times in the training data.
Very cool work.<p>However; I fear this moves software engineering closer to the role of something like plumbing.<p>I've despaired at the state of most software I've used since as far back as I can remember, except when it comes to tools that have the maturity of something like linux, git, emacs, vim and the unix tools.<p>For software to get good - it needs to be deeply understood by at least one person working on it. If you train an army of warrior drones who get full line autocompletion first they'll start forgetting what types this method takes as its parameters, they'll be less likely to explore codebases instead plugging in the first autocompletion that comes to their editor.<p>There bosses will of course want this in the name of "Getting Shit Done". We already have this sort of divide between developers, those who heavily lean on their tools and those who use minimal editor help. Once you are forced to learn a tool because your tool isn't spoon feeding you, you have a chance to better reason from first principles using the code you have available. I don't think it's a shock that a very high percentage of the very best developers use emacs or vim with minimal tooling.<p>I am aware that this whole comment has subtle tones of superiority and elitism and I am genuinely sorry for that but in my experience it's just true that people who lean really hard on their IDEs to do everything for them are less able to develop creative solutions and you can tell from having conversations with them that they don't really understand what they are doing.
Is there an example of something like this, but trained on the actual abstract syntax tree manipulations that are going on behind the scenes?<p>That seems like it would be considerably more effective, because you're removing the noise/overhead of parsing the text and giving a much clearer model of what's being manipulated to the AI.
I was very surprised how well it did mimicking the StackOverflow archives when I trained GPT-2 on them last year: <a href="https://stackroboflow.com" rel="nofollow">https://stackroboflow.com</a> (Only the 345M weights were released back then; now I'm curious how much better 1.5B would do.)
GPT2 is known to be unable to track and bind variables, scaling purely associative models beyond the trivial examples is going to be difficult or more likely impossible.<p>This will end up being a better tabnine. Models like GPT2 are still just approximating intelligence, they are not rationally cognizing.
I can't even imagine what it's like to have so much money that you can spend time working on things like this which are so incredibly unlikely to ever become useful. Congrats and I hope you guys discover a great product some day.
Can someone explain me how these kind of softwares are shared? Would I need to train it again? Or usually the trained models are provided?<p>Is this one in particular open source?
When is OpenAI planning to actually solve a hard problem? They have spent a huge amount of money and time creating useless demos so far.<p>Creating flashy AI demos relatively easy. Creating important AI products that actually operate in the real world is the difficulty.
I tried signing in with my Microsoft account as well, nope, they want you to definitely go ahead and fill out a registration form for Build conference <a href="https://register.build.microsoft.com/" rel="nofollow">https://register.build.microsoft.com/</a>, not gonna happen. Hope they learn not to paywall conferences of this kind, their competition just puts it out on YouTube live.
I would much rather have an AI that is capable of interpreting what I say as code. So if I say:<p>Build me a class which computes the larger of two integers.<p>The AI is smart enough to write it.