GitHub Copilot regurgitates valid secrets

201 pointsby petullaalmost 4 years ago

29 comments

iamlucaswolfalmost 4 years ago

What amazes me is how predictable(?) all of the recent issues were.Don't get me wrong, the folks behind Copilot are clearly, without any doubt smart, creative, and capable. But then... None of these issues (reproducing licensed code ad verbatim, non-compiling code, getting semantics wrong, and now this) are 0.01% edge cases that take specialized knowledge to see or trigger. I remember some of them being called days ago in the initial HN thread by people who haven't had beta access.I really wonder how this announcement/rollout looked like on the management side of things. Because a) these shortcomings must have been known beforehand and b) backlash from people who feel threatened for their jobs/"stolen" of their open source work was (I guess) foreseeable? I've already read calls to abandon GitHub for competitors; this can hardly have been an acceptable outcome here.Nevertheless, Copilot is still one of the most innovative and interesting products I've seen in a while.

评论 #27738407 未加载

评论 #27738507 未加载

评论 #27738704 未加载

评论 #27737475 未加载

评论 #27737789 未加载

评论 #27737334 未加载

评论 #27738387 未加载

评论 #27737428 未加载

MontyCarloHallalmost 4 years ago

Unintentional copyright violations and “leaking” of secrets people accidentally committed to public repos aside, my main issue with Copilot is that I don’t think it actually makes coding easier.Everyone knows it’s usually far easier to write code than to read code. Writing code is a nonlinear process: you don’t start from the first character and write everything out in one single pass. Instead, the logic of the code evolves nonlinearly—add a bit here, remove a bit there, restructure a bit over there. Good code is written such that it can be mostly understood in a single pass, but this is not always possible. For example, understanding function calls requires jumping around the code to where the function is defined (and often deeper down the stack). Understanding a conditional with multiple branches requires first reading all the conditional predicates before reading the code blocks they lead to.Reading, on the other hand, is naturally a linear process. Understanding code requires reconstructing the nonlinear flow though it, and the nonlinear thought process used to write it in the first place. This is why constant communication between partners during pair programming is essential—if too much unexplained code gets dumped on a partner, figuring out how it works takes longer than just writing it themself.Copilot is like pair programming with a completely incommunicative partner who can’t walk you through the code they just wrote. You therefore still have to review most of it manually, which takes much longer than writing it yourself in the first place.

评论 #27737534 未加载

评论 #27738631 未加载

评论 #27738039 未加载

评论 #27738458 未加载

the8472almost 4 years ago

On the other hand it also means someone checked those secrets into github somewhere, so they would also be retrievable with a classic search.

评论 #27736814 未加载

评论 #27737603 未加载

cabirumalmost 4 years ago

Can we please stop (mis)using the term "AI"? It just does not live up to most people's expectations.Copilot is a glorified Markov chain autocomplete sitting on a huge dump pile of data. It is not aware of constructs such as "licenses" or "secrets" most people would have expected from AI. To prevent it from spilling secrets everywhere, a developer ~~should teach the AI a concept of secrets and the meaning of licenses.~~ has to implement a filter. A regexp-based one will do, I guess.

评论 #27737726 未加载

e12ealmost 4 years ago

> SendGrid engineer reports API keys generated by the AI are not only valid but still functional.> GitHub CEO acknowledges the issue... still waiting for them to pull the plugI agree this is an issue for co-pilot as well - but it's really on send grid to invalidate keys that are known to be leaked?Yes, that's inconvenient for the affected customers - otoh they won't get billed for other people's usage - or dinged for someone spamming using their keys...

评论 #27737374 未加载

s_gourichonalmost 4 years ago

It does not generate secrets. The Twitter conversation does not mention that word. Most certainly, it regurgitates secrets it has seen on crawled repos. Can the title be adjusted, please?

评论 #27737053 未加载

评论 #27738738 未加载

评论 #27737420 未加载

评论 #27740410 未加载

评论 #27737052 未加载

loklalmost 4 years ago

In light of this, unless there is evidence to the contrary, I'm going to assume it will also regurgitate malicious code.

评论 #27737269 未加载

评论 #27737389 未加载

beshrkayalialmost 4 years ago

It's really kind of comical at this point. The more this copilot bs continues to be a thing, the more it's making Github seem irresponsible/careless at best.

评论 #27737439 未加载

0x0almost 4 years ago

Any chance Copilot could be made to cough up the DVDCSS or BluRay AACS DRM secrets?

notimrelakatosalmost 4 years ago

I look forward to the bright future were I have to maintain messy code from AI Rockstars.

评论 #27736975 未加载

评论 #27738293 未加载

rsynnottalmost 4 years ago

I'm kind of astonished that this project got greenlit, given Microsoft's previous experiences with embarrassing AI projects (thinking particularly of Tay and Zo).

评论 #27737145 未加载

评论 #27737406 未加载

aritmoalmost 4 years ago

It is one thing to put by accident your API key on your public Github repository.And it's another (bigger) issue for Copilot to pick up that API key and put it in someone else's project.

评论 #27737125 未加载

sputknickalmost 4 years ago

I see this as a problem with the developers who are committing code, and not a problem with Copilot. if you make your secrets accessible then they might be accessed. Also if you are rotating your keys regularly that would also mitigate these issues. This is a problem with humans failing to execute known security best practices, not malicious AI doing something insidious.

fxtentaclealmost 4 years ago

If Copilot was trained only on public repos like they claim, then shouldn't those API keys already be disabled due to existing secret scanning tools?For example <a href="https://docs.github.com/en/code-security/secret-security/about-secret-scanning" rel="nofollow">https://docs.github.com/en/code-security/secret-security/abo...</a>The fact that Copilot recreates API keys that still work makes me wonder if they come from a semi-public place, because SendGrid is usually quite fast at blocking API keys that were accidentally made public.

speedgoosealmost 4 years ago

People put valid secrets in their public repository all the time.Just a quick search:<a href="https://grep.app/search?q=%28secret%7Capi%29_%3Fkey%5Cs%3A%3F%3D%20%5B%22%27%5D%5Ba-zA-Z0-9%5D%7B8%2C%7D%22&regexp=true" rel="nofollow">https://grep.app/search?q=%28secret%7Capi%29_%3Fkey%5Cs%3A%3...</a>

tyingqalmost 4 years ago

I wish he would have tried to track down if the keys were in a public repo before asking Sendgrid about them. If they turned out to be only on Github private repos, that would be new and interesting info.Not saying putting keys in a private, but 3rd party hosted repo, is a terrific idea.

mvolfikalmost 4 years ago

<a href="https://web.archive.org/web/20210705123028/https://twitter.com/alexjc/status/1411966249437995010" rel="nofollow">https://web.archive.org/web/20210705123028/https://twitter.c...</a>> COPILOT SECURITY BREACH> SendGrid engineer reports API keys generated by the AI are not only valid but still functional.> GitHub CEO acknowledges the issue... still waiting for them to pull the plug or make a comment. :popcorn:Quoting <a href="https://twitter.com/pkell7/status/1411058236321681414" rel="nofollow">https://twitter.com/pkell7/status/1411058236321681414</a>

WillDaSilvaalmost 4 years ago

I don't consider this a problem. Copilot was trained on public repos, so these secrets had to be checked into public repos. They were already totally public, and should have been invalidated/replaced and redacted. Copilot might result in previously undiscovered published secrets being found, but that's not much worse than anyone finding one under normal circumstances.

评论 #27737833 未加载

评论 #27737367 未加载

intricatedetailalmost 4 years ago

Grand Source Code theft. A permanent stain on Github?They should scrap it and Microsoft should be ordered to sell Github because they have a conflict of interest.For example Microsoft has access to your private repos and can do things like co pilot with your data. Who knows maybe your code powers Windows 11 now.

评论 #27737749 未加载

Paradigma11almost 4 years ago

I really dont think that stuff hosted in public repos can be classified as secrets.

phumberdrozalmost 4 years ago

The only time I would think this is a valid security issue if those were tokens that were previously not public. But that should not be the case right?

评论 #27737104 未加载

评论 #27737072 未加载

villgaxalmost 4 years ago

We are in a time where you have a crystal ball/chip & who can whisper sweet nothings & get back answers

input_shalmost 4 years ago

There truly is an XKCD for everything: <a href="https://xkcd.com/2169/" rel="nofollow">https://xkcd.com/2169/</a>

sergiomatteialmost 4 years ago

Called it.

评论 #27736722 未加载

评论 #27736719 未加载

ibraheemdevalmost 4 years ago

Was the tweet taken down?

tgsovlerkhgselalmost 4 years ago

Tweet taken down (?), does anyone have a mirror?

评论 #27740167 未加载

aloisdgalmost 4 years ago

This is why environment variables exist

评论 #27736651 未加载

评论 #27737002 未加载

llimosalmost 4 years ago

I do feel for the people behind Copilot, even though they'll have known it was coming. They produce something absolutely frggin' amazing that can change the world and for the next few days all everyone does is pile on and pull it to pieces... yes of course these are valid issues but can we please look at the big picture and appreciate what an achievement this is?

评论 #27737911 未加载

评论 #27737610 未加载

评论 #27739143 未加载

rvzalmost 4 years ago

So GitHub Copilot has inherited all the bad practices of many StackOverFlow and GitHub side projects and generates them in front of you as 'assistance'.All the API keys are still working and who knows, someone might complain about a huge fee right in here because they forgot to revoke it. Only time will tell.I am certainly going to avoid this contraption. No thanks and most certainly no deal.Downvoters: So are you saying GitHub Copilot DOES NOT do the following:<pre><code> Leak working API keys in the editor. Generate broken code AND give you the wrong implementation if you add a single typo? Copy and regurgitates copyrighted code verbatim. Guesses 1 out of 10 tries. Send parts of your code when you type in the editor. </code></pre> Are you VERY sure?

评论 #27737377 未加载