This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?<p>It feels morally wrong to me that I can spend thousands of hours working on projects on my own free will but then a company can sell the code I wrote to others in the form of snippet completion as a service. In fact they end up selling your code back to yourself if you plan to use the service.<p>If the answer is no, that moves the needle pretty far in the direction where I'd at least consider the idea of moving all of my repos to Gitlab. I don't care much about stars or popularity. I open source things that are interesting and useful to me and if other folks want to use it they can but I don't gain motivation from others using the projects I release. I like Github and its UI and it's no doubt "the spot" for open source but selling code written by others rubs me the wrong way a lot. It stinks because it also means no longer contributing to other code bases too. It's moving us in the opposite direction of what open source is about.
I find this whole topic very annoying, this is like the 3rd variation to reach the front page today. But it has made me realize why I instinctively dislike Free Software as a movement.<p>Copyright and licensing are bad, actually. Stop getting worked up about the idea of using courts to punish theft. Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.<p>Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.<p>Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.<p>Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.<p>Free the code.
It is now proven that copilot returns code from codebases with non-permissive licenses [1].<p>I'm curious - what are the legal implications of this going forward? I've so many questions.<p>1. Will Microsoft ever face lawsuits for these license violations?<p>2. If so, who/how? Class-action?<p>3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.<p>4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?<p>[1] <a href="https://news.ycombinator.com/item?id=27710287" rel="nofollow">https://news.ycombinator.com/item?id=27710287</a> | Copilot regurgitating Quake code
I mean, if it's autocompleting a fairly simple line, and can do that because it's analysed a lot of lines, I don't really see that as "stealing anything".<p>If you are using it to write whole complex functions thatare the same as other people's, I guess that is copying.<p>But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.<p>I think the first use case is far more common, and creating boilerplate that is so generic you could never really attribute it anyway.
><i>Hector Martin: If you use Copilot, you are basically playing Russian Roulette that the random mashup of existing, copyrighted, hegerogenously licensed code that you get out of it qualifies as an original work, mostly by chance. Or that nobody will ever sue you otherwise.</i><p>Well, that's already the case with Stack Overflow copypasta enterprise code. If anything, use of Copilot would be an improvement...
Copilot is fair use, get over it!<p>Copilot is not writing your code any more that Google search is writing your code. You are writing your code, and Copilot is just making suggestions.<p>US constitution secures limited copyright to "To promote the progress of science and useful arts". Copilot is just that, get over it!
Bit of a stretch to fashion AI-derived/AI-coauthored works as other people's work. Are DALL-E portraits done Picasso-style unrightfully selling Picasso's works? Is an individual selling portraits done Picasso-style unrightfully selling Picasso's works?<p>No, of course not. Joyce's literature was influenced by Ibsen, Mozart looked up to Haydn, Newton was humble enough that he openly professed he stood on the shoulders of his predecessors, Perelman refused the Millennium prize because it wasn't also offered to his colleague Hamilton.<p>All human innovation is iterative, and derivative. <a href="https://www.youtube.com/watch?v=jcvd5JZkUXY" rel="nofollow">https://www.youtube.com/watch?v=jcvd5JZkUXY</a><p>Our skill doesn't grow in vacuums, without outside mentorship and guidance. There are areas where I am upset about the application of AI, but this is not one of them. Consider copilot a gentle guiding hand for those without access to a second pair of eyes nearby to give you reminders on what you may otherwise have on the tip of your tongue.<p>But in the way that Led Zeppelin refused to recognize how <i>heavily</i> their music was influenced by delta blues artist was unbecoming, I can accept the argument that it is perhaps douchey of Github to sit on Copilot as squarely their creation.
I do feel these arguments are valid if a little overstated. Most devs have googled, found some code, and pasted it in without thinking about attribution. Doesn’t make it right, but it is a question of how much code is being copied and how specific. For example, I peruse open repos to learn - I learned about the spread operator in JavaScript that way- doesn’t mean every time I use it I need to attribute whatever repo I saw it in. But, yeah, if I copied a larger chunk and the owner wants attribution, probably wrong.<p>I like the idea of having the bot automatically update a attribution file if it detects it’s used licensed code. Seems like it would be fairly trivial. Also a robots.txt for repo owners to control automated use.<p>Also, they should totally pay back a portion of revenue to the community and support the repos used to train. That seems like it would be a good PR move if nothing else.
So, how often does it actually happen? Does it happen more often than for a human? Does anyone actually have numbers on this?<p>Of course, if you provide already a copyrighted prefix, and it has seen that code, the chances are high that it would complete the copyrighted code (because that is what you actually would also expect).<p>So, for real use cases in the wild, where you write some own real novel code, how often would it suggest some copyrighted code? And how often would a human?<p>I have used Copilot the last months and I have never ever seen such a case (I can be pretty sure because all the identifier names are really unique, and the code was very custom).<p>However, I assume that I myself might have produced copyrighted code unknowingly because if you write common patterns (e.g. some tree or graph search, or some sort function, implement LSTM or Transformer, whatever), the chances are not so low.
It’s the same problem with those ML models, the other day someone generated a children’s book using GPT3, turned out that there is a real children's book with the same name and a very similar content: The Very Lonely Firefly by Eric Carle.
I'm bit mixed on this, code Copilot usually autocompletes me is not particularly novel, it's just mundane stuff I would write anyway. Most of these snippets are not copyrightable in my opinion, because it was obvious in the first place. Like CSS nth-child odd / even logic, or one case it filled me ~10 lines JS logic of filtering rows by category stored in dataset, which I would have written anyway.<p>Then there are cases where it amazes me completely, it wrote 10 lines of C++ code for rendering a monochrome glyphs with bits using Freetype library. It though had odd subtle bug, the glyphs came reversed and it worked with only certain font size which it seemed to pick up from different file all together.
Pretty soon the world is going to come to realize art/creation is just blending, incrementing and repurposing prior art.<p>No book, painting, codebase, sonnet, design is theft-less.<p>The art is the space reduction, otherwise we’d just bruteforce away.
If you assigned a task to a junior dev, and he/she used some code from open source projects and Stack Overflow to develop a custom program for the task, would you say that this person is selling you other people's code? Is it common or expected for this type of use to be acknowledged?
If we're all standing on the shoulders of giants (specifically code that other people wrote) then really what Copilot is selling is a ladder to get onto those shoulders faster. I think that's a legitimate aim, as such. However it should be careful about not including unlicensed code and should have a specific 'GPL' option for a model trained with GPL code included.<p>I suppose it should also generate appropriate copyright notices to satisfy many open licenses. I'd be surprised if copilot could actually link back to the original code like that, though.
Say, I want to write a getter method like below:<p><pre><code> String getName() {
return name;
}
</code></pre>
Let us also assume that this snippet, unsurprisingly, has been in several copyrighted repos that didn't grant Github the right to share this code.<p>So I start tying "getName" and copilot suggests the exact snippet above. If I use this snippet, is it plagiarism? Even though the above code is the most "obvious" way to write this getter and I would have written it this way even without copilot's suggestion? Or does the "uniqueness" or "non-trivial quantity" of the suggestions have any bearing in determining copyright violation? How/where do we draw the line?
We stand on the shoulders of giants. That had been the way for decades. A newer stack over the older one without much thought. And someone in the future will build even a newer stack over the current ones.
Is it smart enough to:<p>- respect attribution<p>- respect copyleft<p>- respect proprietary licences<p>- give the user appropriate hints about the above<p>Or does it just copy code without doing any of this?
My personal reasons for <i>not</i> using copilot are a bit simpler. I believe the act of researching which solutions to use for a given problem is not so much about time, or the code you end up with, but about developing a better understanding of what you are doing. You may end up just cutting, pasting and modifying a piece of code you found, but hopefully, you were exposed to a few different ways to accomplish the same thing, and it made you aware of other choices that could have been made.<p>You could think of the evolution of practical problem solving in software engineering like this:<p>1. I have to invent a solution (because nobody else in the world has a computer)
2. I have to know of a solution (education, word of mouth...)
3. I have to look up a solution in the books I have (commoditized knowledge)
4. I can look up solutions on the internet <-- (we are here)
5. The computer suggests something and I accept (some are here too)<p>From 1 to 4 the amount of cleverness required to solve small problems drops a bit, but your productivity and exposure to knowledge probably goes up.<p>I'm not quite sure what happens from 4 to 5. Personally I'm actually more interested in the context solutions are presented in than just the solution. In fact, I rarely copy and paste code from the Internet, but I often look at multiple suggestions/solutions and then borrow ideas or combine ideas from several sources.
I might start considering Copilot if Microsoft were to train it on their own internal codebases (Windows, Office, SQL Server). Until they do, it's clearly a "tool for thee but not for me" type of situation.
Sorry for the unproductive tone of this comment, but there's something about the attitude of this tweet that really grinds my gears.<p>Any time someone invents something new and incredible, there's always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.<p>I don't understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.<p>'co-pilot just sells code other people wrote' is such a ridiculous understatement of what co-pilot does. Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.
> Copilot just sells code other people wrote<p>So what? Selling code other people wrote is the foundation of the free software movement. It is the entire business model of countless companies, and it is a good thing. Among them are most major linux distro vendors like Red Hat and Canonical.<p>The value added by Copilot is that they sell you the lines "code other people wrote" you want out of billions.<p>I still think it is derivative work, and that they should only process code under permissive licenses, or, if they want to include GPL code, make a GPL-only version, usable only for GPL projects. I thought it is what they did, there is so much code under permissive licenses that is should be enough to train their model, but apparently, they don't care, as long as it is public, it is included. For me, they are shooting themselves in the foot, several companies have already banned Copilot due to the potential issues with copyright.
I started self hosting when Microsoft bought github and with this mass theft of copyrighted material and then reselling it for money I'm even more happy with my decision.
Copilot very rarely copies code verbatum, and when it does it's very short snippets. When Oracle sued Google over allegedly copying short and fairly trivial snippets of code they were justly derided.<p>I can't speak to the legal side, but I just don't understand the moral outrage over very occasionally copying such short snippets of code. The key innovations and the actual value that licenses are intended to protect aren't in these short snippets.<p>And what does copilot bring to the community? Free use by students, free use by open source maintainers, and a huge boost in productivity for a modest fee for professional devs, for a service that no doubt costs a lot to run, even on the margin.
On a side note, I do believe that short programs or functions should be copyright free by law.<p>Or we as a community need to create a better bsd, a cc0 for everything.<p>Almost everything is nontrivial, and almost everything is copyrighted, at least with the pressure to name the original author (BSD, GPL, other major permissive licenses).<p>Say you want to use a library, then you check for examples in the documentation, now you have to denote somewhere that the example is from the documentation (best if you put it in the source code, so you don't lure other people to copy what you copied and refer you as the author).<p>It is a major PITA at least for me.
When my last company got acquired, part of the due diligence process was a scan of our codebase for snippets from stack overflow. Every snippet found that wasn't posted with a clear license by the author was challenged and we rewrote it.<p>Now, I'm not entirely sure how necessary this was from a legal perspective. But introducing an AI into the mix will bring up a lot of uncertainty when it comes to how much change is required for something to no longer be considered a copy/derivative.
Copilot is a new way for corporations to break copyright while enforcing it for everyone else, this will be the first big use for AI when other corpos follow.
Technically, programmers search, copy and modify code all the time.<p>One might argue copilot puts into software an algorithm that humans are already doing. Software like that is usually inevitable.<p>Still, it sucks there's no benefit for the contributors.<p>The most ethical thing I can think of is some kinda 'Spotify-like' revenue sharing model, based on how often their code is used by others. Not that they'd ever implement that if they can get away with it!
Yes, though in a way so does stackoverflow & friends. Large chunk of dev ecosystem is copy paste and I don't think this is inherently problematic. It is always a case of standing on the shoulders of giants.<p>Its more of a licensing issue to me. As far as I can tell it was train on a blend of licenses which to me makes it inherently non-compliant. At least some of it is going to be copyleft and find its way into closed source.
I'm not a lawyer, nor very well versed in the vast world of licenses and their definitions in court contexts, but I've been wondering about something with the growing appeal ML-generated content has for the average person (and the "high" barrier for entry in the market) — are licenses in some form or another going to adapt to this phenomenon? From a brief search, I have not found any new license with a no-dataset-usage clause (assuming fair use does not apply, that's another big question). What are the chances anything of the sort will become an option for any "creative" work that's usually shared freely (such as artwork, code, et cetera) even despite copyright? What about the ownership of the dataset? It seemed to be questionable years ago already that possibly IP-protected content goes through the black box and resembling material gets on the other side, whose ownership is it really? I'm guessing some notable court cases in the future could define this in the following years if the popularity continues growing.
Artificial Intelligence is causing us to revisit the difference between free as in beer and free as in speech (<a href="https://en.wikipedia.org/wiki/Gratis_versus_libre" rel="nofollow">https://en.wikipedia.org/wiki/Gratis_versus_libre</a>).<p>It is putting a new spin on some traditional Open Source Lessons (<a href="https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#Lessons_for_creating_good_open_source_software" rel="nofollow">https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#L...</a>).<p>People share and reuse snippets of unattributed snippets of MIT-licensed and GPL-licensed code on the internet all the time, StackOverflow, etc.<p>StackOverflow is profiting from that activity indirectly by facilitating it. They profit passively through ad revenue, and actively through the Teams subscription offering.<p>But nobody seem too upset about that.<p>How is an AI which facilitates the same code sharing fundamentally any different? Because it’s scraping it itself, rather than humans contributing it?<p>Seems like a tenuous argument at best.
Traditional 'real' (as opposed to 'imaginary') programming is like writing in assembly code; It's outmoded because of generative models, in a way similar to 'C' outmoding assembly code.
The most important thing, I think, is that free (libre) software developers are able to work with the language models directly, so that libre software is allowed to continue progressing into what I call Imaginary Programming.
That's because with a generative internet all you really need is blockchain + prompting.<p><a href="https://huggingface.co/spaces/mullikine/ilambda" rel="nofollow">https://huggingface.co/spaces/mullikine/ilambda</a><p>Language models are able to 'steal' the linguistic meaning-making 'essence' of the software, by modelling:<p>- How the software is used (mimicing its function) - external meaning<p>- How functions are 'inspired' - internal meaning (reflection)<p><a href="https://github.com/semiosis/imaginary-programming-thesis" rel="nofollow">https://github.com/semiosis/imaginary-programming-thesis</a><p>The models themselves should be clear about where the data came from.
However, this is only possible in a fair world which we do not live in.
Compromise must be made to protect national interests.<p>Generative models are license blind and there's very little that could be done to prevent progress. Like what the invention of the camera has done for art.<p>Large language models including Codex are a transformative technology.<p>Bi-directional fair-use is probably the best result we can hope for.<p>So long as Microsoft and OpenAI are not selling back usage of the model to the
open-source community, I think it's OK, though it's the bare minimum
obligation.
I know this isn't really related to the whole copying ethics debate, but I definitely feel like there's some sort of foul play happening here. For all of the unlicensed projects out there, the license that is automatically granted to Github includes:<p>> the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time<p>It's insane how vague this is. Is Copilot a "Service"? Sure, by its definition:<p>> The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.<p>And since much of the code was published before Copilot's inception, this means Github can just arbitrarily add more "services" and milk the code for whatever it wants. Automatically service-ify any public repository? Sure, pay us for quotas. It's like a legal loophole to let Github just bypass any license restrictions you put on it.
1. You most likely agreed to that by using GitHub.<p>2. Copy&Pasting Code by manual search exists.<p>3. This is just a smart tool so you don't have to figure out yourself what to copy&paste (in the best case) and save a lot of time.<p>Sometimes I truly wonder how people can genuinely be upset about things like this. What is broken are copyright and patent laws in the 21st century.
Copilot is a fancy pattern bot.<p>Humans make original patterns, but since Copilot cannot think, then Copilot does not. It squashes together a bunch of small individual patterns, each under their own license, but at no stage does it do anything more than pick a line from here, and a line from there.<p>It doesn’t think, and it doesn’t create new IP.<p>It is like making a picture out of small snippets of a thousand other pictures, and then selling it.. clearly not OK. You still ripped off the original artists.<p>Or like plagiarising 100 of your class mates’ assignments. Are you less guilty because you went to the effort to steal just a few sentences from each?<p>A criminal who steals a cent from every account at the bank is a more sophisticated thief than someone who holds up a petrol servo.<p>If Copilot doesn’t create new IP (it doesn’t; we established this), then it uses existing IP. And in that case it is no different to any of the three analogies above.
I think this problem has no good solution until IP laws around the world are properly reimagined from the ground up. I'm of the quite radical stance that code, music, art in terms of their intellectual existence should be free for anyone to take. (you can own a harddrive with code on it, and claim noone should steal it, but not the idea of the code itself)<p>If you have ideas, code, music or art which you wish for noone to partake in, do your best to keep them secret. Certainly, breaking into secret areas should be illegal, but once the cat gets out of that bag it gets out of the bag.<p>The creative people behind these ideas I believe will be able to find good compensation nonetheless in society, IP-laws nowadays only serve to protect megacorporations to the detriment of creativity and ideas.
I don't think any professional community is aligned on how to think about ML-generated content yet. We don't know how to apportion rights between the data owner, the model owner, and the end user, and I don't think existing copyright law is ready for it. At least for software, I think the way forward is for the next generation of software licenses to explicitly state whether the code can be used to train ML models and what those models can be used for. Without explicit language, we'll be squabbling over interpretations of fair use.<p>There's going to be some big cases here. It's going to end up in the Supreme Court sooner or later, and if it were to go there today I think I know what they'd say.
If the portion of code that Copilot lifts is the "heart" of the original work, that would be much less likely to be considered fair use[1], regardless of the length.<p>> For example, it would probably not be a fair use to copy the opening guitar riff and the words “I can’t get no satisfaction” from the song “Satisfaction.”<p>I wonder how this could be integrated into the system?<p>[1] <a href="https://fairuse.stanford.edu/overview/fair-use/four-factors/#the_amount_and_substantiality_of_the_portion_taken" rel="nofollow">https://fairuse.stanford.edu/overview/fair-use/four-factors/...</a>
Tough pill to swallow. Microsoft's actions don't seem fair, but fighting them with copyright could weaken <i>fair use</i>:<p><a href="https://felixreda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/" rel="nofollow">https://felixreda.eu/2021/07/github-copilot-is-not-infringin...</a><p>There's a good argument that demanding copyright protections on scraped datasets and short snippets is a double-edged sword. It could harm search engines, distribution of news, and non-commercial ML research too.
At every turn, in every instance, for decades, all stories involving Microsoft end in "...and then Microsoft fucked people over." I've witnessed this firsthand since the 80s.
Should the snippets that Copilot is regurgitating be considered for copyright in the first place?<p>It seems akin to trying to copyright a certain drum pattern or chord progression.<p>Also, the history of the GPL, MIT, commercializing lisp machines, Symbolic, infighting, etc… seems a very different context than Copilot so I am having difficulty seeing the systemic problems that tools like this encourage.<p>There is of course a surface level similarity in that a corporation is profiting from IP in the public domain but the devil is in the details.
Jaron Lanier's book "Who Owns the Future?" Is all about AI and compensating those that input in training these very valuable models.<p>I highly recommend everyone read it.
It'd be nice to see some proof here. Copyright is not absolute and does not extend, for example, to things that have no creativity in them. There are only so many ways to write a for loop or an if condition. Training an ML model from a large body of code IMHO violates copyright no more than any of us reading code and learning from it, as long as GH Copilot doesn't spit out code that's exactly the same as something already existing.
Programmers are fine when their creations, pretty much all of tech, resells content that other people wrote for free, but no, not code, that one must be expensive
It is incredible to use though. I pasted the return value of an API call in comment, then started to write a schema class. Codepilot just created the entire class for me. wanted to extract a subset of the data, I typed get_<_name_of_the_subset>(), it wrote the code I would have written.<p>So even without using someone else code, just the pattern understanding and the production of simple boiler plate code is great.
Why is it a bad thing? You either have people spending time reading code and learn every little thing and produce the same work in days, or have Copilot saves human life time for hours. Coding would be more efficient, it is a win-win for everyone in this industry, right? I know people attach to the code they write, but we all learn from books, and the result is common enough.
> what github / microsoft is counting on here is that open source developers do not have enough collective power to do anything to stop this<p>I think it much more likely that they count on everyone liking it way too much to give a shit about their MIT code not being attributed correctly.<p>I certainly don’t. MIT just seems like the most convenient license for people that need licenses (corporations?), so that is what I use.
I somewhat agree with that. Yesterday I edited some exotic configuration (Kubernetes CSI driver for Cinder) and Copilot suggested me config which looked like someone's config. There were no values, so it was good at filtering them out, but it definitely looked like cleaned part of code which resides in some project.<p>I don't think that's bad though. Code sharing is good for overall productivity.
MS and Github are thieves, all their code is closed source, yet they sell copyrighted code they don't own. If they told us years ago that our code will be automatically stolen by an "AI", most coders would not have created an account. The innovation here is that they have access to most of the worlds open source code and automated the stealing.
If GitHub could guarantee that the code Copilot had ingested was only made with OSS licenses, then I don't see what the problem is.<p>But as far as I understand, GitHub trained Copilot on any public repository on GitHub, meaning even if it doesn't have a license specified (so the user publishing it still has the copyright to it), then I don't see how it can be OK.
There needs to be an update to either licenses or GitHub (and other) software directly, or even software terms of services, that gives the user an opportunity to opt-out of their data being used to train proprietary AI models.<p>'I don't agree with having an AI trained on/with my data.'<p>IMHO, all other problems with copilot stem from this.
Sure, the concern is valid but I feel like this tweet adds absolutely no substance to the discussion and just repeats the same opinion that was already rehashed to death since copilot originally launched. As such, especially with the tone that the tweet has, I don't expect constructive discussion to raise here.
Reading many of the comments here I feel like one important thing is being left out that is not related to legal, but to social issues:<p>Who is on the side of open source? Where are the big, powerful institutions and companies that deeply care about authors and communities providing free software that so many of us rely on?
There are a few reasons why this could be considered ethical. First, open-source code is typically free to use, so the company would not be taking advantage of anyone by using it to train their AI. Second, the company would be providing a service that people are willing to pay for, so they would be generating value for society. Third, the company would be transparent about what they are doing and would not be hiding anything from the public.<p>...the above was generated by GPT-3 (text-davinci-002). Prompt: Write an argument for why using open-source code to train an AI and then sell the code generating service (without open-sourcing it) is ethical.<p>The main argument against this is that it takes away from the open-source community that contributed to the development of the code in the first place. By selling a code-generating service without open-sourcing it, the company is profiting from the work of others without contributing back. This is unfair and takes away from the overall open-source ecosystem.<p>Added two characters to the prompt :P
This isn't how a language model works.<p>It's SO frustrating that even on HN people still fall for this naive and incorrect analysis. Pasting bits I've said before on this topic:<p>Language models do not work like this. They can copy content but usually that's for something like the GPL language text.<p>Generally they work on a character by character basis predicting what is the most likely character to appear next.<p>This very rarely results in copying text, and almost never rare text.<p>Mechanically it has learnt both syntax of language and how concepts relate. So when it starts generating it makes sentence that are syntactically valid but also make sense in terms of concepts.<p>That's really different to just combining bits of sentences, and it gives rise to abilities you wouldn't expect in something just cutting and pasting bits of sentences. For example, few shot learning is mostly driven by its conceptual understanding and can't be done by something with no way to relate concepts.
I'm going to make a bold prediction: no one will ever lose a copyright lawsuit due to usage of Github Copilot generated code. The code snippets it produces are too small or trivial to qualify for copyright infringement.
Seems like my original questions [1] are more relevant than ever!<p>[1] <a href="https://news.ycombinator.com/item?id=27677598" rel="nofollow">https://news.ycombinator.com/item?id=27677598</a>
MrDoob has an excellent point about this:<p><a href="https://twitter.com/mrdoob/status/1539740854956412929" rel="nofollow">https://twitter.com/mrdoob/status/1539740854956412929</a>
It's as the saying go, "when a product is free to use, the real product is actually you".
In this case, our code is the product.
Just considering now on swapping to another git provider...
Copilot sells the service of finding the code that makes sense for what you write. Would be better if it could correctly attribute the source(s) though, I hope they will solve this problem at some point.
Is github copilot using private repositories for the learning process?<p>If yes, how do they mitigate the risk of exposing private data when something is quoted verbatim?<p>If not, then why are repos with non permissive licenses ok?
Beware geeks with gifts. This is Microsoft. The question isn't "is it good?" but "Why are Microsoft offering it and how is it undermining everyone else?"
What stops me from re-uploading copyrighted source, where I remove the notices and push it with an MIT license? If such a data set has been trained with, how do you get it out?
And social media sells ideas other people thought.<p>Copilot is limited to public code now, but it may easily be trained on non-public code - albeit this probably won't be for sale to the public.
All I can think of is Steve Yegge [1]: "They have no right to do this. Open source does <i>not</i> mean the source is somehow 'open'."<p>My code is on Github so that people can read it, reuse it and learn from it. "The freedom to study how the program works", as the FSF says. If some of the people reading it are machines, why would that matter?<p>[1] <a href="http://steve-yegge.blogspot.com/2010/07/wikileaks-to-leak-5000-open-source-java.html" rel="nofollow">http://steve-yegge.blogspot.com/2010/07/wikileaks-to-leak-50...</a>
Github Copilot is selling code other people wrote as much as the author of this thread is profiting from words other people invented.<p>Absolute nonsense.
and Dalle2 sells art other people created<p>(I'm actually not being sarcastic, I think there needs to be some sort of pipeline for compensating the artists who are used to train these models
what AI is showing is the fuzzy line between creating and copying. The truth is they are both always present in everything we do, we've just been trying to hide it.<p>So it should be as simple as if you're using other people's content for your own profit you should properly compensate them.<p>Or we could just abolish copyright law and assume that everything humans create emanates from culture so its always collectively built and everything should be open source.<p>Or we just do the same we've been doing. Create even more complex laws trying to define this fuzzy line in a way that companies can keep profiting from it a lot more than individuals.
I'm using it for a day now and i'm really impressed. It is so aware of stuff in old code, that it is scary. I'm working in an old application with Zend Framework.
Isn't every programmer in history (except the gall who invents her own language and writes all her own code) simply an archeologist for other people's work?<p>We all Duck/Google for code anyway. Why not admit and make it easier?
The code Copilot suggest from any given project most of the time is not enough to credit such project, when I look up code in some GitHub repo, and copy it fully or part of it, I do not credit that project.<p>I do not see Copilot as useful anyway.
I disagree. Copilot is selling content-aware code suggestions, which is a result of code that other people wrote in their platform, and which in no way affects the work of these people.
I get the feeling this entire debate would have been non-existent had this been a Jetbrains product instead.<p>The whole thing is just bizarre when the vast majority of developers constantly look at OSS code daily and lift ideas/patterns/snippets from there regularly without once looking at whatever license is attached.
Well, this does invite an interesting comparison. If we imagine something like Copilot applied to music I believe the chances of ending up in court would be pretty high. There are a lot of examples of plagiarism lawsuits in popular music and the outcome seems to be entirely random.<p>One could argue that the information density in chord progressions, bass lines and beats is extremely small. And that any recognizable part of a musical idea that has been "borrowed" would necessarily make up a larger percentage of the complete work than would be the case for a typical application with borrowed snippets.<p>That's not a bad argument, but it is unsatisfactory because it means that at some point someone has to make a judgement on how much you can borrow.