TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GitHub Copilot is not infringing copyright

347 pointsby aarroyocalmost 4 years ago

74 comments

glitchcalmost 4 years ago
I disagree with this article. GitHub Copilot is indeed infringing copyright and not only in a grey zone, but in a very clear black and white fashion that our corporate taskmasters (Microsoft included) have defended as infringement.<p>The legal debate around copyright infringement has always centered around the rights granted by the owner vs the rights appropriated by the user, with the owner&#x27;s wants superseding user needs&#x2F;wants. Any open-source code available on Github is controlled by the copyright notice of the owner granting specific rights to users. Copilot is a commercial product, therefore, Github can only use code that the owners make available for commercial use. Every other instance of code used is a case of copyright infringement, a clear case by Microsoft&#x27;s own definition of copyright infringement [1][2].<p>Github (and by extension Microsoft) is gambling on the fact that their license agreement granting them a license to the code in exchange for access to the platform supersedes the individual copyright notices attached to each repo. This is a fine line to walk and will likely not survive in a court of law. They are betting on deep lawyer pockets to see them through this, but are more likely than not to lose this battle. I suspect we will see how this plays out in the coming months.<p>[1] <a href="https:&#x2F;&#x2F;www.microsoft.com&#x2F;info&#x2F;Cloud.html" rel="nofollow">https:&#x2F;&#x2F;www.microsoft.com&#x2F;info&#x2F;Cloud.html</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;contact&#x2F;dmca" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;contact&#x2F;dmca</a>
评论 #27741145 未加载
评论 #27740747 未加载
评论 #27742234 未加载
评论 #27741114 未加载
评论 #27743492 未加载
评论 #27741655 未加载
评论 #27741473 未加载
评论 #27742043 未加载
评论 #27740804 未加载
评论 #27742174 未加载
codesectionsalmost 4 years ago
Julia Reda&#x27;s analysis depends on the factual claim in this key passage:<p>&gt; In a few cases, Copilot also reproduces short snippets from the training datasets, according to GitHub’s FAQ.<p>&gt; This line of reasoning is dangerous in two respects: On the one hand, it suggests that even reproducing the smallest excerpts of protected works constitutes copyright infringement. This is not the case. Such use is only relevant under copyright law if the excerpt used is in turn original and unique enough to reach the threshold of originality.<p>That analysis may have been reasonable when the post was first written, but subsequent examples seem to show Copilot reproducing far more than the &quot;smallest excerpts&quot; of existing code. For example, the excerpt from the Quake source code[0] appears to easily meet the standard of originality.<p>[0]: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27710287" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27710287</a>
评论 #27737539 未加载
评论 #27739834 未加载
评论 #27738946 未加载
评论 #27739808 未加载
评论 #27737325 未加载
评论 #27737199 未加载
评论 #27737219 未加载
评论 #27737485 未加载
评论 #27741502 未加载
phoe-krkalmost 4 years ago
<i>&gt; On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either. The output of a machine simply does not qualify for copyright protection – it is in the public domain. That is good news for the open movement and not something that needs fixing.</i><p>This is very good news. This line of thought implies that we can legally feed all proprietary code into GitHub Copilot in order to teach it all the patented and secret tricks of the companies we can see (since data mining is not copyright infrigement) in order to have it print those secrets back when we ask it to (so they become public domain).<p>&#x2F;s
评论 #27737235 未加载
评论 #27740120 未加载
评论 #27738655 未加载
评论 #27739592 未加载
评论 #27737175 未加载
评论 #27737330 未加载
评论 #27738938 未加载
creshalalmost 4 years ago
&gt; On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work.<p>Cool. I&#x27;ll just train my new AI on 20 different copies of the same Disney movie and have it generate a new movie. Checkmate, lawyers!
评论 #27737204 未加载
评论 #27737215 未加载
评论 #27739511 未加载
trefferalmost 4 years ago
Well, I have a hard time drawing a line between GitHub Copilot and a compression algorithm.<p>If you can reproduce a verbatim copy of Quake source code after taking that source code as input before then that&#x27;s compression. A really fancy, but still.<p>And given that it reproduces the source code: it has to hold that somewhere.<p>It would be very interesting if someone could reproduce the Quake example with AGPL code, then request the whole model + code because it clearly contains the AGPL code in some encoded form.
评论 #27739911 未加载
评论 #27739137 未加载
评论 #27739152 未加载
评论 #27739140 未加载
评论 #27741850 未加载
评论 #27740708 未加载
MattIPv4almost 4 years ago
This seems to completely ignore the fact that we&#x27;ve seen Copilot regurgitating exact copies of existing code, and even with the incorrect license attached when it was asked for it. [0]<p>[0] <a href="https:&#x2F;&#x2F;twitter.com&#x2F;mitsuhiko&#x2F;status&#x2F;1410886329924194309" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;mitsuhiko&#x2F;status&#x2F;1410886329924194309</a>
评论 #27738726 未加载
评论 #27739959 未加载
评论 #27737143 未加载
denton-scratchalmost 4 years ago
&gt; The output of a machine simply does not qualify for copyright protection<p>&quot;Simply&quot;? If it were that simple, surely that would mean that the output of the Unix &quot;cp&quot; program would not qualify? What about a DVD copier?<p>I&#x27;m OK with copyright as it used to be, back when I was a teenager; the right expired with the author&#x27;s life. Corporations couldn&#x27;t own copyrights. There was no burden on the author to register their rights. And copyright was a civil matter; you sued for actual damages. Infringement wasn&#x27;t a crime.<p>I&#x27;m not OK with modern copyright law, with criminal penalties, rights that can be transferred to entities that are essentially immortal, and copyright terms that keep getting extended, just before Mickey Mouse and Elvis Presley become public domain.
评论 #27739230 未加载
评论 #27739348 未加载
robbrown451almost 4 years ago
This is not the black and white issue that the article implies it is, with statements such as &quot;Machine-generated code is not a derivative work&quot;.<p>Imagine a web scraping robot that just grabbed textual news articles and spit them out verbatim to searchers (without giving credit or linking to the original). That is obviously copyright infringement, even though it is done by a robot.<p>Now imagine it does slight modifications to the text, using a thesaurus and maybe a bit of AI. It might substitute &quot;is able to&quot; for &quot;can&quot;, or &quot;frequently&quot; for &quot;often&quot;, but otherwise everything is left as is. Is that &quot;machine generated&quot;?<p>Same goes for a hypothetical bot that scrapes existing music, and after listening to &quot;He&#x27;s So Fine&quot;, comes up with the melody for &quot;My Sweet Lord.&quot; (as per the famous George Harrison copyright case from the 70s) It isn&#x27;t off the hook simply because a machine was involved. If it truly &quot;learns&quot; what makes a good melody, and uses that to generate a very different melody (that might be equally similar to a dozen different songs), that&#x27;s different.<p>There is a full spectrum between simple bots that copy verbatim, and something that &quot;deeply learns&quot; and then writes a new article, or generates new source code, or writes a new melody, or whatever.<p>I don&#x27;t have a strong opinion on GitHub Copilot since I haven&#x27;t really studied what it does and therefore I don&#x27;t know where it lies on that spectrum, but this article is not useful if the author doesn&#x27;t really explore the nuance, and treats everything as absolutes.<p>(and I should say, I am very much of the opinion that copyright law as it is, is hopelessly broken, and I am always glad to see things like Copilot just so we can see it demonstrated why. But that is veering off topic...)
评论 #27743286 未加载
dr_kiszonkaalmost 4 years ago
My less lofty personal gripe with Copilot is as follows. I worked hard to produce quality code. GitHub will make money off my code. Copilot users will make money using my code. I - the creator - will make nothing.<p>At the very least, I should have been asked whether my code can used by Copilot and I should get at least a share of the profit Copilot generates every month, where the share equals to my code &#x2F; all training code used by Copilot. The latter part could be gamed by other developers in the future, but it&#x27;s the best I could come up with.
评论 #27742104 未加载
评论 #27740483 未加载
评论 #27739761 未加载
moralestapiaalmost 4 years ago
&gt;it suggests that even reproducing the smallest excerpts of protected works constitutes copyright infringement<p>Actually, it is. It has to do with whether the small excerpt is copying what could be called &quot;the heart of the work&quot;; which in the case of code I would argue is almost always what you are after. No one&#x27;s gonna copy the indentation style, boilerplate around functions&#x2F;blocks, punctuation, etc. You always go for the &quot;functional&quot; part of the code, which is definitely &quot;the heart of the work&quot;.<p>The heart of Carmack&#x27;s fast inverse square root lies in its selection of a particular set of constants and operations that happen (i.e. were designed) to approximate the square root without taking an expensive path. Copyright law would look at this novelty; I don&#x27;t think it would argue around &quot;the use of subtraction and multiplication in a computer program&quot;, as that would be plainly stupid.<p>I am surprised that someone who is supposedly an expert in copyright law does not (or pretends not to know) about this, not only that, but to actually suggest the opposite. This is copyright 101, come on.
temacalmost 4 years ago
&gt; What is astonishing about the current debate is that the calls for the broadest possible interpretation of copyright are now coming from within the Free Software community.<p>It is not astonishing at all given:<p>* proprietary codebase have not been indexed by copilot (at least a public version of it)<p>* arguably derived code will be used in proprietary programs
评论 #27739213 未加载
truffdogalmost 4 years ago
If Microsoft is confident that Copilot is not a parrot, they should include their proprietary codebases in the training database.
评论 #27739312 未加载
评论 #27739170 未加载
betwixthewiresalmost 4 years ago
&gt; ...some commentators accuse GitHub of copyright infringement, because Copilot itself is not released under a copyleft licence...<p>This is not why. The issue at hand as I understand it is that people using copilot will potentially have code snippets in their work that are already licensed they do not know the license for and that they will not license properly as a result.<p>That&#x27;s in the first paragraph. If you enter this discussion with an incorrect presumption from the outset I don&#x27;t see how you can form a valid defense.<p>&gt; However, by doing so, the copyleft scene is essentially demanding an extension of copyright to actions that have for good reason not been covered by copyright.<p>No. Nobody is asking for an extension of copyright protection, we are asking for the existing reach of copyright to be respected. We built our licenses based on a ruleset that we were told is fair. You don&#x27;t get to violate rules <i>you</i> made and then claim that copyleft people only made their licenses because as a workaround to copyright and so are being hypocrites.<p>&gt; Others focus on Copilot’s ability to generate outputs based on the training data. One may find both ethically reprehensible, but copyright is not violated in the process.<p>The arguments I&#x27;ve heard are not that Microsoft is using publicly available information to train it&#x27;s AI. The argument is that people are potentially (and in some current cases demonstrably) getting <i>copy pasted code snippets from licensed software.</i> If you can&#x27;t see the plainly obvious problem here it&#x27;s because you&#x27;re trying not to.<p>Also a point made in the article, that machine generated things cannot be copyright because copyright requires a creator, brings up an interesting question as to whether works by people who used copilot can be licensed at all.
nooberminalmost 4 years ago
I&#x27;ve said this before, but I hope the issue isn&#x27;t infringement per se, but that the produced code isn&#x27;t automatically GPL&#x27;ed. The author argues that machine generated code isn&#x27;t copyrighted and this is good because it essentially fits the &quot;data wants to be free&quot; mentality, but I&#x27;d say tell that to the people who use it. Will they, after using something derived from open source, have to open source their code? No, they won&#x27;t. If anything, this finally provides closed source developers with what they&#x27;ve always wanted, a means to rip open source code without having to return contributions.<p>Julia Reda hints at that last bit as being an issue but only in a parenthetical. To the author, that literally is <i>the whole point</i>. Do people not remember the Free Software vs. Open Source debate? Or GPL vs BSD? The requirement that derived works also be free is literally the important bit in Free Software. This only fits the mentality of &quot;data wanting to be free&quot; if your model of that idea includes the permissive sensibility and doesn&#x27;t care about actually changing the state of things, which is making free software more widely used in the world over proprietary software.
mabboalmost 4 years ago
&gt; Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either. The output of a machine simply does not qualify for copyright protection – it is in the public domain<p>This is fantastic news.<p>I&#x27;m going to create a bot that crawls sites like GitHub searching for popular libraries. Then it will copy them- sans any license- to it&#x27;s own website where it will sell these libraries under a new name.<p>Since there is no creator here, just a piece of software, then there is no copyright violation. My system simply is &quot;inspired&quot; by the original source code using a proprietary algorithm that I call &quot;Copy and paste&quot;.<p>I&#x27;m open to accepting venture capital for this project.
评论 #27737356 未加载
评论 #27738457 未加载
评论 #27737284 未加载
评论 #27737261 未加载
dj_mc_merlinalmost 4 years ago
I think a good deal of engineers here should familiarize themselves with Julia Reda and her work and ask themselves if they have the legal knowledge to debate on this matter. Common knowledge is not acceptable to determine truth.<p>Would you really respect the opinion of some dude who&#x27;s only used Excel about your profession?
评论 #27742446 未加载
评论 #27742140 未加载
zxcb1almost 4 years ago
Open source developers deserve the same rights as corporations.<p>As a side note, in a not so distant future there may be decompilers enhanced by artificial intelligence.
ClumsyPilotalmost 4 years ago
Julia is one the few MEPs that properly engages with issues of copyright and is active in IT. I really appreciate it, even if I dont always agree with her
评论 #27737247 未加载
评论 #27737306 未加载
评论 #27737304 未加载
rektidealmost 4 years ago
I&#x27;ve seen way too many screenshots of a dozen-line complete XHR wrappers being suggested[1] to complete a function to imagine Copilot as a generative machine. It&#x27;s a somewhat fancy copy paste engine, with phenomenal search. But it&#x27;s smuggled through enough complexity &amp; machinery to obfuscate any legal obligations that might be attached to the original source material.<p>The article does not set itself up to address this at all:<p>&gt; Since Copilot also uses the numerous GitHub repositories under copyleft licences such as the GPL as training material, some commentators accuse GitHub of copyright infringement, because Copilot itself is not released under a copyleft licence, but is to be offered as a paid service after a test phase.<p>I&#x27;m all for discussion of whether Copilot itself has to be copyleft. But to me, the immediate concern is that Copilot seems like a way to take copyleft works and remove the copyleft license from those works.<p>[1] <a href="https:&#x2F;&#x2F;mastodon.social&#x2F;@cjd&#x2F;106513694972486353" rel="nofollow">https:&#x2F;&#x2F;mastodon.social&#x2F;@cjd&#x2F;106513694972486353</a>
评论 #27740118 未加载
hu3almost 4 years ago
Please copy my code. Reality is I&#x27;ll be gone in 100 years tops and I&#x27;d be more than glad if my crappy code actually helps someone.<p>As for attribution, we all learn by looking at code from all kinds of licenses. Between Stack Overflow, projects hosted in GitHub, libraries that sit on our vendor directories and even closed source projects there&#x27;s a lot that is carried over to new projects without attribution.<p>We&#x27;re heading to a world were most projects are basically libraries glued together anyway. Standing on the shoulders of giants and all that.<p>The dream of an omniscient pair programming buddy is slowly coming to fruition and I for one welcome.<p>Copilot is just a tool, fancy search engine for the code that&#x27;s available online. Projects should be judged by the way they use Copilot just like I&#x27;m judged if I misuse my car.<p>I couldn&#x27;t care less whether my name is shoved in some ever increasing CONTRIBUTOR.md file that no one but machines will read.<p>I&#x27;m actually going to start documenting blocks of code more thoroughly so Copilot can better infer what each block does.
chrisseatonalmost 4 years ago
But doesn’t Copilot generate verbatim copies of entire copyright methods that implement non-trivial novel algorithms, including comments?<p>The article doesn’t seem to address this?
评论 #27737162 未加载
aeturnumalmost 4 years ago
I&#x27;m not a copyright expert but I wonder about an implication of two of this author&#x27;s points:<p>- Reading and remembering-about (like reading a book yourself) things does not infringe on copyright.<p>- Copyright does not apply to the output of mechanistic code generation (as opposed to the human-written code that generates the code).<p>So where does that leave the quake snippet (setting aside its own release as open source)? Assuming this technical description is correct, Copilot does not contain the code, just the correct weights to contextually reproduce it perfectly. Copyright does not apply to the chunk that Copilot produces, so does the code simply exist as Copilot created it without license? If that is correct, what are the limits? Could I train a ML algorithm to reproduce binaries from context and, if those produced binaries happen to be identical to other copyrighted products, then it&#x27;s fine?
评论 #27743258 未加载
评论 #27742875 未加载
kube-systemalmost 4 years ago
&gt; If it were not possible to prohibit the use and modification of software code by means of copyright, then there would be no need for licences that prevent developers from making use of those prohibition rights (of course, free software licenses would still fulfil the important function of contractually requiring the publication of modified source code).<p>The parenthetical backpedaling here is the <i>entire point</i> of copyleft. If it wasn&#x27;t, copyleft wouldn&#x27;t exist -- people would just release their software as public domain.<p>The opposite of &quot;copyleft&quot; isn&#x27;t &quot;copyright&quot;.<p>The opposite of &quot;copyleft&quot; is &quot;never published&quot;, in which case, copyright is irrelevant.<p>There is plenty of commercial closed-source software based on software released under permissive licenses like BSD, MIT or Apache, because they are not copyleft.
评论 #27740429 未加载
zeptonixalmost 4 years ago
Really having trouble getting the unrelenting hatred here on this site for something that&#x27;s fundamentally new, clearly represents progress, and is obviously a trend that&#x27;s here to stay. &lt;CrazyIdea&gt;Maybe the laws, licensing, etc. need to, you know, adapt, change, and evolve a little bit with time also -- just as everything else changes with time.&lt;&#x2F;CrazyIdea&gt;
sascha_slalmost 4 years ago
I frankly think that the &quot;free culture&quot; label and extremely permissive licenses of many open source project are nothing but a redistribution of wealth upwards. Those with existing capital can make profitable unfree derivative works without any benefit to original authors. This relationship must go both ways if you want actual free culture. Stop producing MIT&#x2F;BSD code in your non-work time.<p>This is not a research project, this is a commercial work that produces verbatim copies of code without disclosing its license (or having a license grant in many cases). It doesn&#x27;t matter how it manages to reproduce it either. It does.
评论 #27737528 未加载
评论 #27737326 未加载
评论 #27740191 未加载
评论 #27739334 未加载
sprafaalmost 4 years ago
Amazing how this was never an issue when other “AI” systems use other people’s data to learn how to drive cars&#x2F;write text. But man you start messing with developer data and suddenly there are ethical issues! Amazing turnaround.<p>Face it - AI as we currently call it is just a very sophisticated data sorting algo in most cases (let’s ignore the AlphaZero non supervised learning type). Everyone was getting celebrating when Common Man was destroyed by devs commoditising their knowledge through data capture. But now suddenly it’s a problem! Mess with a man&#x27;s pocket.
评论 #27741264 未加载
评论 #27742166 未加载
wruzaalmost 4 years ago
These sorts of discussions always puzzle me. Copyright is not an objective thing, it’s a contract that supports the cash flow of a creator-distributor-consumer chain, given the former two assumed it will work to cover their expenses and return profits they expected. If your AI produces Beatles-like or even better music based on Beatles albums and&#x2F;or some more, an AI-aware <i>judge</i> may (or may not, depending on the lobbying activity) decide that it is a direct derivative work, and in case it is automated, all copyright rules apply as in “copy” “right”. There is no need for technical objectivity to exist in between, because this law is not about technicalities. What seems like a loophole may be closed easily by a court decision based on much higher matters than “similarity” or “reconstruction”. If anyone can take your album at the release date and “reshuffle” a free version not worse than the original with few clicks, it is obvious damage to the copyright holder and it demotivates creating it. AI couldn’t do any of that back then, and they didn’t include right terms to cover that, but now it can, and they will just add that, unless someone (MS in this case) has better lawyers, who are ready to create a wide-enough precedent and drag it through all instances.
visargaalmost 4 years ago
Copilot is the moment when simple functions have been commoditized, you can have as many as you like almost for free, and adapted to any project. Just spend a moment to admire the transition, it&#x27;s a new stage of post-scarcity.<p>AI can recreate photos, paintings, sounds, voice, music, human faces, text, dialogue, math, proteins, and now code. It does all this while allowing humans to control and direct the whole process, and create original combinations. They all have no economic value to own and are free to use now, like words in a language. Enjoy!<p>Remember Karpathy&#x27;s Char-RNN? How long we&#x27;ve come.<p><a href="http:&#x2F;&#x2F;karpathy.github.io&#x2F;2015&#x2F;05&#x2F;21&#x2F;rnn-effectiveness&#x2F;" rel="nofollow">http:&#x2F;&#x2F;karpathy.github.io&#x2F;2015&#x2F;05&#x2F;21&#x2F;rnn-effectiveness&#x2F;</a>
malwraralmost 4 years ago
Who cares if they&#x27;re infringing copyright?<p>Microsoft bought the place that has a lot of our code and now is going to try and sell us a tool that will regurgitate it back on demand. The entire software industry is already largely based and advanced by the unpaid labor of open-source software project developers, GitHub as a popular open source ally could at least pretend to honor the gentleman&#x27;s agreement of at least agreeing to respect the open-source origins of a ton of its stack.<p>If the tool was also open we probably wouldn&#x27;t have nearly as big a problem, but I guess Microsoft has to recoup the cost of their completely unnecessary purchase.
评论 #27740810 未加载
ZoomZoomZoomalmost 4 years ago
&gt; On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive.<p>Wow, what a bunch of, ahem, logic of questionable quality.<p>&quot;On the other hand, the argument that the outputs of Copilot are works is based on the assumption that a machine can <i>produce</i>. This assumption is wrong and counterproductive. It just moves electrons and hums a bit.&quot;<p>This is a reductio ad absurdum. The argument is bogus, because what matters is <i>the result</i> of a person&#x2F;other legal entity using the machine and its software.
jfmcalmost 4 years ago
Modern AI seems more like machine-assisted collage (or pictures, code, text, etc.) than anything else. Someone (of some other algorithm) needs to be added to ensure that the whole thing makes sense. The big problem here is that when an artist creates a collage he&#x2F;she knows the sources. Here provenance is lost.<p>[1] Collage (&#x2F;kəˈlɑːʒ&#x2F;, from the French: coller, &quot;to glue&quot; or &quot;to stick together&quot;;[1]) is a technique of art creation, primarily used in the visual arts, but in music too, by which art results from an assemblage of different forms, thus creating a new whole.
kalium-xyzalmost 4 years ago
“ If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.” I don’t see my license respected for code it regurgitates that I wrote, there is nothing more to this.
codelordalmost 4 years ago
To understand if Copilot is infringing the licenses of codes used in its training data we have to get into the details of what it does and how it works. We can&#x27;t make a general statement for any code generation software that was trained on open source code.<p>It is a possible that at some point maybe even in not so distant future we will have ML models so good they can understand abstract concepts, learn and invent new algorithms and implementations by reading code. Such a ML model can be argued is learning similar to a human and hence it&#x27;s not infringing any copyrights because it&#x27;s not copying implementations it&#x27;s learning concepts and ideas.<p>But we are not there yet. When we get there we will know. Because at that point Siri would be able to have seamless conversations with you. At least half the jobs would disappear in favor of robots in a short time. The world would be a different place.<p>Let&#x27;s talk about what Copilot actually can do. It can copy snippets of code from Github while changing variable names. It can autocomplete trivial boilerplate code. If it&#x27;s automagically generating a function for you that actually does something useful like sorting an array, you can be absolutely sure that it&#x27;s just copy pasting it from an existing repo with some cosmetic changes.
makecheckalmost 4 years ago
Of <i>course</i> derivative works are being produced!! Whether you blame Copilot or the developer using it, the result is something that required the original developer of the code in order to be constructed.<p>Have we reached the point where every “class X” must become “class X_GPL2_CopyrightJohnQSmith_AllRightsReserved” in every code base out there? Do we need to go from header comments at the top of a file to reminder comments at the end of every line?
reilly3000almost 4 years ago
How does one address the fact that 95% of software is based on the same basic tropes? At a certain level of density, all code trying to achieve a similar function to legally-protected code will convene on an implementation that is almost indistinguishable. With LOC accreting exponentially, only time will determine when we reach that threshold. The Copilots of the world serve to accelerate and monetize this reality.
turtletontinealmost 4 years ago
The idea that the debate actually does a disservice to copyleft by relying on the strictest interpretations of copyright is an interesting perspective to me, but the rest of this seems pretty weak. (Caveat that I&#x27;m no lawyer.) Copilot can regurgitate verbatim chunks of other codebases: it seems absurd to me that that wouldn&#x27;t count as derivative work.
tzahifadidaalmost 4 years ago
I believe it is true that in most cases it won&#x27;t be infringing. Since even though it can sometimes output some trivial code verbatim to the original, that original work won&#x27;t run or compile and therefor it is not even a work, just gibrish. Simply limiting the copilot scraping to software with at least a few files will probably resolve that issue. Moreover, if I where github I would simply change that statement regarding the legality and let the developer make the choice if to use or not. More often then not, it would probably make this academic talk that no one cares about. a few functions here and there is not work. Almost never people try to copy stuff like a micro kernel or something so small as to constitute a work. Personally I would rather treat this as a search engine. I don&#x27;t copy paste code but I may be old fashion and probably the exception here.
mrh0057almost 4 years ago
Why is everyone ignoring the fact what neural networks do? It is being used as a search context aware pattern matching and use that to predict what you will write next. Of course it&#x27;s going to return copyrighted works based on what you right.<p>It&#x27;s a pattern matching algorithm what exactly did they think it was going to do?
评论 #27746113 未加载
jordighalmost 4 years ago
&gt; The output of a machine simply does not qualify for copyright protection<p>Wolfram disagrees, and he&#x27;s got lawyers and money too. Whom do we believe?<p><a href="http:&#x2F;&#x2F;www.groklaw.net&#x2F;article.php?story=20090518204959409" rel="nofollow">http:&#x2F;&#x2F;www.groklaw.net&#x2F;article.php?story=20090518204959409</a>
评论 #27739381 未加载
bennyp101almost 4 years ago
Countdown to Oracle lawsuit in 3, 2 ...
vharuckalmost 4 years ago
&gt;What would then stop a music label from training an AI with its music catalogue to automatically generate every tune imaginable and prohibit its use by third parties? What would stop publishers from generating millions of sentences and privatising language in the process?<p>The existing barrier we have is that, unless the music label can prove a human artist has listened to the specific song matching the artist&#x27;s, there&#x27;s no copyright violation. A copyright protects creators from having their work <i>copied</i>. It doesn&#x27;t give them ownership over matching works. I&#x27;m sure there are plenty of pairs of novels with the same first sentence despite each author never having read the other&#x27;s work.
评论 #27740569 未加载
captaincavemanalmost 4 years ago
If I understand what is being stated correctly; even if I assert a prohibition in my licence for my creative work (code) not to be used by Copilot (or any other machine learning model as training data), it wouldn&#x27;t matter as its not covered by Copyright?
dragonwriteralmost 4 years ago
&gt; Copyleft does not benefit from tighter copyright laws<p>Of course it does, at least the goal copyleft serves for RMS style Free Software ideologues. While copyleft may be motivated by an ideology that prefers <i>no</i> copyright protections, at least for software, it relies on copyright maximalism to avoid nonfree derivatives. From advocates viewpoint, the worst situation is a copyright regime that is strong enough that it allows nonfree software to exist but is also weak enough that it prevents creating an iron wall that prevents the use of software built by ideolgoical opponents of nonfree software from being used to advance nonfree software.
yakubinalmost 4 years ago
<i>&gt; The output of a machine simply does not qualify for copyright protection – it is in the public domain.</i><p>Does it mean that compiler output does not qualify for copyright protection and I may legally share copies of MS Word via torrent?
SXXalmost 4 years ago
I think it&#x27;s time for someone to train AI on leaked proprietary code and source-available code like Unreal Engine. It&#x27;s cool that we have so much of it right now.<p>Then we&#x27;ll see how fast Microsoft and others will shut it down.
CyberRabbialmost 4 years ago
&gt; Works licensed under copyleft may be copied, modified and distributed by all, as long as any copies or derivative works may in turn be re-used under the same license conditions. This creates a virtuous circle, thanks to which more and more innovations are open to the general public.<p>She claims that Copilot advanced the goals of copyleft but copilot does not create a “virtuous cycle” of generating more public IP. The customers of Copilot use Copilot extract public work through Copilot for themselves and are not compelled to contribute back.<p>Copilot is anti-FOSS plain and simple.
Cort3zalmost 4 years ago
I wonder how long it will take for the licenses to start explicitly disallowing this sort of usage. It is clearly something that many open source writers dislike, and in my opinion, rightly so.
nixpulvisalmost 4 years ago
Claiming that generated work isn&#x27;t work seems completely wrong. The hard to argue fact is, looking at the result, it doesn&#x27;t really matter who wrote it, just how it reads, and what it does.<p>What is lost in so much of the arguments about Copilot is that someone still needs to actually verify the code does the right thing. I have a feeling this tool does little but increase little bugs like off-by-one errors or all kinds of havoc; primarily because of false confidence in the autocomplete.
orthoxeroxalmost 4 years ago
Whether Copilot itself violates GPL or not is one issue.<p>Whether the code produced by Copilot violates GPL or not is a whole different independent issue.<p>If I am walking down the street, find a piece of paper with code on it, pick it up and add the code to my program and this code turns out to be licensed under the GPL then my program becomes a derivative work. It doesn&#x27;t matter who wrote it on that piece of paper, whether it&#x27;s a 100% correct copy of the GPLed code or not or if there are mistakes in it.
madroxalmost 4 years ago
Copilot, to me, feels like a faster Stack Overflow. We already copy code snippets from all kinds of places across the web without thinking about how it&#x27;s licensed. Sometimes, we copy whole functions and files. We&#x27;re responsible for understanding what&#x27;s going into our project. We don&#x27;t blame NPM when it allows us to import a package into a project that subsequently violates the license. I&#x27;m absolutely sure this happens more than anyone cares to admit.
flazxalmost 4 years ago
&quot;This is a slightly modified version of my original German-language article first published on heise.de under a CC-by 4.0 license.&quot;<p>Heise appears to be quite $bigcorp friendly recently.
评论 #27739398 未加载
评论 #27737238 未加载
评论 #27739184 未加载
nhumrichalmost 4 years ago
&gt;&gt; machine generated code is not derivative work<p>Even if true, that doesn&#x27;t indemnify copilot here. There is no way to _prove_ whether the code was generated by copilot vs yourself. Copilot is just autocomplete, so its still a human checking in the code. While it might not be illegal for copilot to generate those things, its illegal for a human to check it in and claim it as their own.
alkonautalmost 4 years ago
Whether Copilot infringes copyright is a muddy area. I personally would like to think that the world where machines can be trained on any data is easier to live in than one where trained machines are tainted by the licens of input.<p>The interesting question however isn&#x27;t whether Copilot infringes copyrights, but whether those that <i>use</i> copilot do.
评论 #27742255 未加载
tyingqalmost 4 years ago
Guess round 2 will have Copilot dumping to AST, changing function and variable names, then dumping back to source.
chxalmost 4 years ago
&gt; The short code snippets that Copilot reproduces from training data are unlikely to reach the threshold of originality.<p>I can only repeat myself: In light of Google v. Oracle going as far as the Supreme Court I find your confidence in this quite astonishing.
mawekialmost 4 years ago
The output of Copilot may be not a derivative work, but the trained model surely is, right?
marcodiegoalmost 4 years ago
Simple way to fix this mess: allow to user to choose training data samples licenses.
评论 #27740806 未加载
alfiedotwtfalmost 4 years ago
Has anyone tried dumping the debugging symbols from a Microsoft binary e.g explorer.exe and tried to autocomplete^Wcopilot its functions? Would be interesting how far Microsoft could be pushed before they ate their own hat.
boleary-glalmost 4 years ago
I’d agree with this conclusion if it wasn’t clear that it is very possible - if not common - for Copilot to just completely copy code. That isn’t fair use - that’s a clear violation of copyright regardless of license.
justshowpostalmost 4 years ago
Huh? The article is full of waffle, but in the condensed form it shills for:<p>(a) treating an <i>adopted</i> code as «trivial» as i++, which is pure demagogy because what we&#x27;ve seen already on that CoPilot video is NOT trivial<p>(b) dismissal of (let me put it straight) piracy as somewhat special case of fair-use, which is valid only when code in question stays on video as prop, the real code ISN&#x27;T fair-use<p>(c) accepting (a) and (b) above as ultimate truth just because bogey stricter copyright laws hurts FOSS. And the water is wet. This is absolutely meaningless filler because we know what <i>stricter copyright laws hurts everyone</i> since Napster days.<p>So my overall impression from this reading is just... Huh?<p>&gt; My name is Julia, I&#x27;m the Pirate in the European Parliament.<p>Didn&#x27;t she split with Piratenpartei?
scotty79almost 4 years ago
Don&#x27;t you think that our world would be way more relaxed and flourishing place if lawers kept their noses out of software like they are keeping them out of math?
emrahalmost 4 years ago
Copilot itself may not be infringing copyright or GPL, but its users will be if they incorporate its suggestions into their commercial products.
Causality1almost 4 years ago
<i>The output of a machine simply does not qualify for copyright protection – it is in the public domain.</i><p>Is it just me or is that a patently ridiculous statement? The output of a machine belongs to the person owning&#x2F;using the machine. If I use a digital camera to take a picture of a copyrighted image I&#x27;m still committing copyright infringement despite the output being created by a machine and a bunch of image processing software.
oolonthegreatalmost 4 years ago
Such a weird argument: &quot;Copyleft people should not argue for better copyright&quot;. What does that even mean?
评论 #27741563 未加载
ksecalmost 4 years ago
So may be it is best to have a separate license for Machine Learning? Let&#x27;s call it copilot licences. ( May be it is better to call it an exemption ? )<p>You will need AGPL &#x2F; GPL &#x2F; LGPL &#x2F; MIT &#x2F; Apache &#x2F; BSD + Copilot licences before it can be used for training? Knowing there are a very small possibility that some code snippet will be the output?<p>I mean we could endless debate this with no solution unless this is put into court.
softwaredougalmost 4 years ago
This is kind of beside the point. Something can still be unethical and perfectly legal. The issue is that machine learning can whitewash a developers intended license.<p>Or put differently, as a GitHub customer, are you comfortable with your code being used this way? Instead of a passive host, your code is now being used to create tremendous value for GitHub and Microsoft. Do you feel your trust has been violated? (regardless of legality).
COMMENT___almost 4 years ago
TLDR; GitHub will eventually add some kind of &quot;data usage reporting&quot; utility that could show which parts of final your code made with help of this CuckPilot could potentially infringe copyright with links to other known sources of these parts of code. Then they will tell you that it is your responsibility to ensure that your final code does not have copyright issues.
flippinburgersalmost 4 years ago
Embrace, extend, extinguish.
dominicjjalmost 4 years ago
&quot;(of course, free software licenses would still fulfil the important function of contractually requiring the publication of modified source code)&quot;<p>No no no. Licenses are NOT contracts. Someone who copies or makes derivative works of copylefted software which they then distribute is obliged to remain within the bounds of the license not because they voluntarily promised, but because they don&#x27;t have any right to act at all except as the license permits.<p><a href="https:&#x2F;&#x2F;www.gnu.org&#x2F;philosophy&#x2F;enforcing-gpl.en.html" rel="nofollow">https:&#x2F;&#x2F;www.gnu.org&#x2F;philosophy&#x2F;enforcing-gpl.en.html</a>
评论 #27739512 未加载
评论 #27739554 未加载
评论 #27739896 未加载
评论 #27739898 未加载
评论 #27739491 未加载
评论 #27739458 未加载
评论 #27739655 未加载
Sr_developeralmost 4 years ago
This is a supposedly progressive politician, young, in an advanced country, her personal platform runs almost entirely on copyright issues and yet she gets almost everything wrong, what can you expect from your usual dinosaurs?
评论 #27741444 未加载
swileyalmost 4 years ago
So copyright is dead then?<p>Can we merge all the leaked driver source into Linux and have decent OSes on handhelds yet?<p>If I train an &quot;ML autocomplete&quot; on the &quot;OpenNT&quot; source can I share it legally?
varispeedalmost 4 years ago
Mass processing, repackaging and then selling the data is an exploitative business these multi-billion companies run without paying anything to the people who produced the data.<p>This is wrong and should be stamped out.
yunohnalmost 4 years ago
Here we go again, a legal expert weighs in with a long and detailed post about Copilot;<p>And HN rallies to criticize it because Copilot can reproduce some snippets when forced to.
评论 #27739066 未加载
Syzygiesalmost 4 years ago
Whatever the law, when does learning from what we read devolve into plagiarism?<p>The poster child for this category would be those programs that generate nonsense English text that recognizably resembles a known author. They choose the next character at random, conditionally based on the previous characters. Too short a context, and the results are gibberish. Too long a context, and the results are plagiarism.