GitHub Copilot Generated Insecure Code in 40% of Circumstances During Experiment

261 点作者 elsombrero超过 3 年前

37 条评论

lmilcin超过 3 年前

I thought this should have been expected.Security starts with deep understanding.Some standards and practices can help avoid some types of problems, and some are even rather effective (like airgapping your systems), but there isn't any way to assure security in general other than truly understand what you are doing.**I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.For a good developer those low level, low engagement activities are not a problem (except maybe for learning stage where you actually want people engaged rather than copy/paste). What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.My working theory about this is this is going to hinder new developers even more than they already are by google and stack*. Every time you are giving new developers an easier way to copy paste code without understanding you are robbing them an opportunity to gain deeper understanding of what they are doing and in effect prevent them from learning and growing.It is a little bit like giving answers to your kids homework without giving them chance to arrive at the answer or explaining anything about it.**Another way I feel this is going to hurt developers is competition in who can produce most volume of code.I have already noticed this trend where developers (especially more junior but aspiring to advance) try to outcompete others by producing more code, close more tickets, etc. Right now it means skipping understanding of what is going on in favor of getting easy answers from the Internet.These guys can produce huge amounts of code with relatively little actual engagement.To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).The Copilot is probably going to make it even more difficult for people who want to do it the right way because even starker difference in false productivity measurements.

评论 #28402865 未加载

评论 #28404508 未加载

评论 #28404809 未加载

评论 #28403539 未加载

评论 #28403950 未加载

评论 #28407552 未加载

评论 #28404037 未加载

评论 #28402749 未加载

评论 #28410405 未加载

shireboy超过 3 年前

…Compared to 60% of circumstances in the meat-based developer control group? :)

评论 #28405550 未加载

评论 #28403420 未加载

toastal超过 3 年前

You are the free labor copilot to train Microsoft GitHub's Copilot tool. You are responsible for any of those insecure code errors and the diligence require. You will be on the hook for resulting problems. But Microsoft and their home-phoning, tracking-embedded editor will get real people to correct and train their machine for free—with their stated plan of later selling that machine back to us later.I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.

评论 #28402941 未加载

评论 #28403078 未加载

评论 #28403548 未加载

评论 #28402940 未加载

gnrlst超过 3 年前

I've experienced this first hand: the autosuggest is scarily accurate and insidious at the same time. On numerous occasions I've auto-filled a 10-15 line suggestion that looked like it was exactly what I wanted to do, but made a very critical mistake (e.g. in a For loop, referencing the wrong array despite calling it the right name). Not really security related stuff, but head scratchers that make it harder to debug since I didn't actually write the code.

mbrevda1超过 3 年前

For comparison, what percentage of human-generated code is secure?

评论 #28402764 未加载

评论 #28403285 未加载

评论 #28402762 未加载

moretti超过 3 年前

I use Copilot mostly as replacement for intellisense and macros. It helps me automating repetitive tasks. I would never trust Copilot for an algorithm or a snippet, I mean I would treat the code just like anything taken from StackOverflow or Github.

wcarss超过 3 年前

I couldn't find a link to the actual study anywhere in the article: <a href="https://arxiv.org/abs/2108.09293" rel="nofollow">https://arxiv.org/abs/2108.09293</a>

adamsvystun超过 3 年前

It is important to remember that Copilot can improve. 40% is not a bad baseline, but one data point does not give us much info, we should wait and see the rate of improvement.

lampe3超过 3 年前

I'm using copilot now for some time and yeah it's more a toy than real help right now.The only time it really helped when I needed to create a named list of char codes.When it comes to more complex code than checking the code of copilot takes the same time as writing it. 90% of the time I needed to correct copilot.For me, tools like linters are way more helpful then. If I could only use ESLint or copilot, I would go 100% of the time with ESLint.

评论 #28404933 未加载

dexen超过 3 年前

Half joking:so far GitHub Copilot is more feasible as tool for humans doing code-coverage for its input code, "given enough eyeballs, all bugs are shallow" style. When a developer goes, "huh, Copilot generated insecure code, better report it to the original project it learned it from" - if only Copilot was able to link to the original project, it would all be great and useful.

0-_-0超过 3 年前

I fail to see how this is particularly useful information about Copilot. The comparison should be:1. How many times do people write insecure code when not using Copilot?2. How many times do people write insecure code when using Copilot?

评论 #28403670 未加载

rcarmo超过 3 年前

As many people have pointed out indirectly, this is almost certainly caused by the training set. Without a bias or ranking for quality, it will just churn out the “best fit” or most popular snippets…

whazor超过 3 年前

Happily having access to GitHub copilot, it very often generates the code that I want. So it saves me from typing and also often saves checking Stack Overflow. I think the libraries/packages you use also play a big influence in how easy it is for copilot to create security flaws. Still, more training against security holes would be appreciated.

Animats超过 3 年前

Well, of course. GPT-3 has no underlying model of meaning. It's just autocomplete with a bigger data set. Used on natural language, it produces text that looks reasonable for about three paragraphs. Then you realize it's just blithering and has nothing to communicate. (Like too many bloggers, but that's another issue.)

wccrawford超过 3 年前

I'm actually impressed with that. There's so much insecure code out there that I'd have expected it to generate insecure code most of the time.I'd still not use it. But it's an impressive trick.

bottled_poe超过 3 年前

What’s the baseline? 60% may still be superior to the average implementation.

queuebert超过 3 年前

This is exactly what an AGI would do if it wanted to pwn all our systems.

cannabis_sam超过 3 年前

Of course it did, why would github copilot “care about” security, unless the majority of code on github cared about security?

arvindamirtaa超过 3 年前

Unit tests with TONS of assertions, cleaned data from form to ORM object, stuff that look look like you're just through a list and doing the same thing over and over. For these, Copilot is great. I wouldn't trust it to do anything else though.Nothing more. Nothing less.

COMMENT___超过 3 年前

It's painful to see that GitHub Copilot is called "AI". For god's sake, it is not AI. It's just an advanced auto-complete for coders. GPT-3 is close to AI, GitHub Copilot is not.Jesus Christ, please make them stop. Stop using AI as a buzzword.

评论 #28407085 未加载

评论 #28406997 未加载

eurasiantiger超过 3 年前

I wonder if the Copilot model could somehow be repurposed to analyze the quality of a developer’s code. Seeing how Microsoft owns both GitHub and LinkedIn, it’s a good bet this is something they’re actively researching.

amw-zero超过 3 年前

If it's trained on code that we write, that sounds completely accurate.

Vaslo超过 3 年前

It's learning from existing code, right? Doesn't this say something about developers in general, or is the thought that it uses combinations of code that are insecure?

评论 #28404903 未加载

softwaredoug超过 3 年前

I will say I’m not looking forward to writing some mundane code today.It’s interacting with GCS to scan a bucket for an extension, load the data with pandas, and concat some dataframes. It’s something dumb but mildly finicky that’s going to eat up so much time I could be using for higher value work.Copilot would be very welcome as I do this, instead of annoyingly going off to Google 3 different python libraries and getting it all to work nicely together.

gfiorav超过 3 年前

I think this should be pretty much expected. I'm unfamiliar with how this network is trained, but I'm pretty sure the data ranking is not perfect.I'm guessing the ranking features are based on the repo stats, contributor stats, etc. Even "good" contributors will make rookie mistakes in certain areas.Interesting to imagine how GH will try to solve this issue.

评论 #28402846 未加载

mzs超过 3 年前

the actual paper:<a href="https://arxiv.org/abs/2108.09293" rel="nofollow">https://arxiv.org/abs/2108.09293</a>previous discussion including comments from lead author:<a href="https://news.ycombinator.com/item?id=28279365" rel="nofollow">https://news.ycombinator.com/item?id=28279365</a>

RhysU超过 3 年前

See also...You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion<a href="https://deepai.org/publication/you-autocomplete-me-poisoning-vulnerabilities-in-neural-code-completion" rel="nofollow">https://deepai.org/publication/you-autocomplete-me-poisoning...</a>

luke2m超过 3 年前

So what? Copilot isn't intended to replace humans, just help them out and maybe reduce typing.

monkeydust超过 3 年前

I think where OpenAI Codex (which is what Copilot uses) gets more interesting is when the allow you to fine-tune the model on your own (trusted) code. That could help reduce the time it takes for new engineers to get up to speed for example.

IlliOnato超过 3 年前

My take on Copilot has not changed. I believe it will make programmers that produce junk code more productive, by being able to produce more junk code in less time.

cblconfederate超过 3 年前

Assuming that this is what it learned from its human counterparts, i'm surprised it 's so low.

ransom1538超过 3 年前

Has anyone here got past the wait list? I and my team members have been waiting for months.

lvl100超过 3 年前

Why would anyone use this in production? Just use Sourcegraph if you need help that badly.

spyder超过 3 年前

Well, it wasn't trained to output secure code was it?

makach超过 3 年前

..does that means we will be 60% more secure than before?

评论 #28403027 未加载

evolveyourmind超过 3 年前

Meaning 40% of the code on GitHub is insecure

评论 #28402839 未加载

评论 #28403043 未加载

mullikine超过 3 年前

But this problem is solved using GitHub's CodeQL for searching and filtering generated code. By combining Copilot with GitHub Semantic and GitHub CodeQL, you have a means of writing and generating the code you want in a secure way. This means that you no longer need the original source code that was used to train Codex. Training Codex and selling as a product in the form of Copilot steals the essence of the original source code used to train it, to build the future of programming, while paying nothing back to the original authors. Even Elon Musk was opposed to OpenAI exclusively licensing to Microsoft GPT.<a href="https://edition.cnn.com/2020/09/27/tech/elon-musk-tesla-bill-gates-microsoft-open-ai/index.html" rel="nofollow">https://edition.cnn.com/2020/09/27/tech/elon-musk-tesla-bill...</a>It's so transformative that people may allow it to circumvent licenses.