ML code generation vs. coding by hand: what we think programming will look like

141 pointsby matijashalmost 3 years ago

24 comments

mort96almost 3 years ago

Okay but I can't avoid noticing the bug in the copilot-generated code. The generated code is:<pre><code> async function isPositive(text: string): Promise<boolean> { const response = await fetch('https://text-processing.com/api/sentiment', { method: "POST", body: `text=${text}`, headers: { "Content-Type": "application/x-www-form-urlencoded", }, }); const json = await response.json(); return json.label === "pos"; } </code></pre> This code doesn't escape the text, so if the text contains the letter '&' or other characters with special meanings in form URL encoding, it will break. Moreover, these kinds of errors can cause serious security issues; probably not in this exact case, the worst an attacker could do is change the sentiment analysis language, but this class of bug in general is rife with security implications.This isn't the first time I've seen this kind of bug either -- and this class of bug is always shown by people trying to showcase how amazing Copilot is, so it seems like an inherent flaw. Is this really the future of programming? Is programming going to go from a creative endeavor to make the machine do what you want, to a job which mostly consists of reviewing and debugging auto-generated code?

评论 #32100125 未加载

评论 #32101976 未加载

评论 #32100220 未加载

评论 #32099782 未加载

评论 #32104463 未加载

评论 #32099868 未加载

评论 #32100963 未加载

评论 #32099958 未加载

评论 #32101766 未加载

评论 #32104951 未加载

评论 #32102756 未加载

评论 #32101693 未加载

评论 #32102106 未加载

spicyusernamealmost 3 years ago

Machine Learning algorithms are only as good as the data they are trained on.For tasks like self-driving or spotting cancer in x-rays, they are producing novel result because these kinds of tasks are amenable to reinforcement. The algorithm crashed the car, or it didn't. The patient had cancer, or they didn't.For tasks like reproducing visual images or reproducing text, it _seems_ like these algorithms are starting to get "creative", but they are not. They are still just regurgitating versions of the data they've been fed. You will never see a truly new style or work of art from DALL-E, because DALL-E will never create something new. Only new flavors of something old or new flavors of the old relationships between old things.Assuming that it is even possible to describe novel software engineering problems in a way that a machine could understand (i.e. in some complex structured data format), software engineering is still mostly a creative field. So software engineering isn't going to performed by machine learning for the same reason that truly interesting novels or legitimately new musical styles won't be created by machine learning.Creating something new relies on genuine creativity and new ideas and these models can only make something "new" out of something old.

评论 #32100275 未加载

评论 #32100279 未加载

评论 #32100582 未加载

评论 #32103821 未加载

weegoalmost 3 years ago

Trying to solve the lack of progress in simplifying the complexity of standard development scenarios needs to be the next step in framework development. I just don't understand how or why it needs to be polluted by the unnecessary added hidden complexity of ML. You're just shifting your unknowns into a black box and it's magic output.

评论 #32099726 未加载

评论 #32099684 未加载

评论 #32099621 未加载

FredPretalmost 3 years ago

My work is 90% business and 10% code. I do not see how we can have AI writing useful code on its own without also understanding lots of context about human society.When I code something out, I have to first know which problem to solve, and why it’s a problem. I then have to understand the problem in minute detail. This usually involves a slew of human factors, desires, and interfacing with idiosyncratic systems that evolved a certain way because of human factors and desires.

评论 #32103461 未加载

layer8almost 3 years ago

If we imagine that at some point in the future, the majority of code would be produced by ML code generation, then where will the training data for future ML models come from?With the current way of ML code generation, it seems that there will always need to be a critical mass of coders who produce code on their own, to be able to serve as new input for the ML models, to be able to learn about new use cases, new APIs, new programming languages, and so on.ML code generation may serve as a multiplier, but it raises questions about the creation and flow of new knowledge and best practices.

zaptheimpaleralmost 3 years ago

The problem with all of these arguments is we are just shooting in the dark.The day before Copilot launched, if someone had told me code generation at the fairly decent quality Copilot achieves was already possible I probably wouldn't believe it. I could happily rattle off a couple of arguments for why it might be better than autocomplete but could never write a full function. Then it did.Who can say how far it's come since then? I think only the Copilot team knows. I wish we could hear from them, or some ML experts who might know?

评论 #32106852 未加载

cassacalmost 3 years ago

The problem with this is that, just like visual programming, it doesn’t really do the hard parts.I remember hearing “a million monkeys with a million type writers writing for a million years could write Shakespeare”. Sure, but how would they know it? How could they recognize when they have achieved their goal?ML will lower the bar for the trivial tedious things and let people believe they are more capable than they are. Even with ML you will have to know what to ask and what the answer should be. That is always the hard part.

评论 #32100427 未加载

datpuzalmost 3 years ago

I've been using a Copilot for a while. It's neat, saves me some typing sometimes, but it's not even remotely close to doing even a small portion of my work for me.

评论 #32101872 未加载

评论 #32102191 未加载

tehsaucealmost 3 years ago

“Although ML code generation helps with getting the initial code written, it cannot do much beyond that - if that code is to be maintained and changed in the future (and if anyone uses the product, it is), the developer still needs to fully own and understand it.”I think this is a misconception. It’s true for these first prototype code generation tools, but there’s no reason to think that in the future these models won’t be adapted to modify/maintain code too.

colordropsalmost 3 years ago

The article presupposes that ML is only for code generation but not maintenance. Why couldn't this change in the future? Perhaps you'd have an english language document that describes the functionality, and as you update it, the code gets re-generated or updated.

评论 #32099994 未加载

评论 #32099797 未加载

just_boost_italmost 3 years ago

There's only 2 options, either you train on the full mediocrity of Github, or you over-train on highly rated projects, effectively just copy/pasting those projects and riding roughshod over any licensing the authors think they have. At least the mediocrity of the suggested code suggests they're trying to do the former.

jrm4almost 3 years ago

The momentum behind this whole thing feels strongly like a handwavy attempt to subvert the GPL and similar.

评论 #32101137 未加载

cdrinialmost 3 years ago

I think the biggest misconception with a lot of the comments here is that they are assuming that the code generated by copilot is final. Copilot, as the name suggests, is a human-in-the-loop system. It requires a developer to give it prompts and to massage the code it generates. It's not intended to or will it ever feasibly be able to generate code entirely independently.The point is that this massaging is a lot faster than writing everything from 0 in most cases. It doesn't have to be bug free. It doesn't have to be perfect quality. It doesn't have to do "the hard parts" of coding. It just has to be good enough to be sufficiently faster for the developer than starting from 0, and it is.

truculentalmost 3 years ago

I think it’s hard to view ML generated code as the future when you see how powerful type-driven development can be with the likes of Idris and others.Perhaps the best approach could use ML within the constraints provided by a type system. Does research like this exist?

kaetemialmost 3 years ago

So far my most practical use of Copilot is prompting it with `// Handle errors`.

dgb23almost 3 years ago

Garbage in garbage out.

snickerbockersalmost 3 years ago

No, it's still a terrible idea even if it's used as an auto-complete. As the article correctly states, the primary job of a software engineer is to maintain existing code, not to write new code. You're not going to be doing yourself any favors by committing code that you didn't even read (unless you're one of those people who can get away with shoving all your bugs onto coworkers, in which case you're an asshole).

评论 #32104490 未加载

AllegedAlecalmost 3 years ago

> For the purposes of this post, I will not delve into the questions of code quality, security, legal & privacy issues, pricing, and others of similar character that are often brought up in these early days of ML code generation. Let’s just assume all this is sorted out and see what happens next."Let's assume magic ponies exist, are commonplace and would love to give us all magic rainbow rides"

LAC-Techalmost 3 years ago

It's kind of sad that the future of programming isn't more expressive languages. It's auto-generating code in the languages we have.

dustingetzalmost 3 years ago

AI coding is not constrained to javascript, which is a function of human knowledge inertia and hiring practices, all irrelevant to AI-assisted coding. Ideally syntax becomes a non issue and we can start valuing correctness and maximizing business leverage per LOC. However we don’t know how long the transition window will be. Everything is accelerating, so quickly I hope!

marmadaalmost 3 years ago

I think it is strongly likely that Copilot will be 1 million times better by the end of the decade. Increasing compute, new techniques, etcThe top comment on this post is complaining about a bug in Copilot generated code. 1 million times better Copilot won't do that.

评论 #32101156 未加载

评论 #32102979 未加载

aaaaaaaaaaabalmost 3 years ago

>[...] thank you for your generous comments, ideas and suggestions! You made this post better and made sure I don't go overboard with memes :).Yeah man, I'm not sure about the latter. Not sure...

评论 #32099905 未加载

pabs3almost 3 years ago

Wonder if anyone is working on using ML for machine code generation.

oefrhaalmost 3 years ago

> a huge win for the developer community> allows us to write even less code and care about fewer implementation detailsRemember Bjarne Stroustrup: “I Did It For You All…”? More code, more complexity — more job security.

评论 #32100477 未加载