TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Copilot sells code other people wrote

713 pointsby joemanacoalmost 3 years ago

97 comments

nickjjalmost 3 years ago
This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?<p>It feels morally wrong to me that I can spend thousands of hours working on projects on my own free will but then a company can sell the code I wrote to others in the form of snippet completion as a service. In fact they end up selling your code back to yourself if you plan to use the service.<p>If the answer is no, that moves the needle pretty far in the direction where I&#x27;d at least consider the idea of moving all of my repos to Gitlab. I don&#x27;t care much about stars or popularity. I open source things that are interesting and useful to me and if other folks want to use it they can but I don&#x27;t gain motivation from others using the projects I release. I like Github and its UI and it&#x27;s no doubt &quot;the spot&quot; for open source but selling code written by others rubs me the wrong way a lot. It stinks because it also means no longer contributing to other code bases too. It&#x27;s moving us in the opposite direction of what open source is about.
评论 #31853200 未加载
评论 #31852417 未加载
评论 #31854174 未加载
评论 #31856437 未加载
评论 #31852470 未加载
评论 #31852135 未加载
评论 #31854136 未加载
评论 #31876985 未加载
评论 #31854652 未加载
评论 #31869666 未加载
评论 #31852421 未加载
Guid_NewGuidalmost 3 years ago
I find this whole topic very annoying, this is like the 3rd variation to reach the front page today. But it has made me realize why I instinctively dislike Free Software as a movement.<p>Copyright and licensing are bad, actually. Stop getting worked up about the idea of using courts to punish theft. Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn&#x27;t.<p>Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.<p>Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.<p>Copy my MIT licensed code without attribution? I don&#x27;t give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.<p>Free the code.
评论 #31852607 未加载
评论 #31853551 未加载
评论 #31853051 未加载
评论 #31852660 未加载
评论 #31855217 未加载
评论 #31853010 未加载
评论 #31855544 未加载
评论 #31853981 未加载
评论 #31854349 未加载
评论 #31853480 未加载
评论 #31855724 未加载
评论 #31855248 未加载
评论 #31857778 未加载
评论 #31852592 未加载
评论 #31854034 未加载
评论 #31854079 未加载
评论 #31853509 未加载
VoodooJuJualmost 3 years ago
It is now proven that copilot returns code from codebases with non-permissive licenses [1].<p>I&#x27;m curious - what are the legal implications of this going forward? I&#x27;ve so many questions.<p>1. Will Microsoft ever face lawsuits for these license violations?<p>2. If so, who&#x2F;how? Class-action?<p>3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.<p>4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27710287" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27710287</a> | Copilot regurgitating Quake code
评论 #31850545 未加载
评论 #31848437 未加载
评论 #31851583 未加载
评论 #31848155 未加载
评论 #31850830 未加载
评论 #31856004 未加载
评论 #31851514 未加载
评论 #31852131 未加载
评论 #31852516 未加载
antiheroalmost 3 years ago
I mean, if it&#x27;s autocompleting a fairly simple line, and can do that because it&#x27;s analysed a lot of lines, I don&#x27;t really see that as &quot;stealing anything&quot;.<p>If you are using it to write whole complex functions thatare the same as other people&#x27;s, I guess that is copying.<p>But if you do the second thing you are not a great dev, and would have probably ended up copy pasting it anyway.<p>I think the first use case is far more common, and creating boilerplate that is so generic you could never really attribute it anyway.
评论 #31846904 未加载
评论 #31847037 未加载
评论 #31846843 未加载
评论 #31847242 未加载
评论 #31847281 未加载
评论 #31846998 未加载
coldteaalmost 3 years ago
&gt;<i>Hector Martin: If you use Copilot, you are basically playing Russian Roulette that the random mashup of existing, copyrighted, hegerogenously licensed code that you get out of it qualifies as an original work, mostly by chance. Or that nobody will ever sue you otherwise.</i><p>Well, that&#x27;s already the case with Stack Overflow copypasta enterprise code. If anything, use of Copilot would be an improvement...
评论 #31847313 未加载
评论 #31847465 未加载
评论 #31847173 未加载
评论 #31847823 未加载
borishnalmost 3 years ago
Copilot is fair use, get over it!<p>Copilot is not writing your code any more that Google search is writing your code. You are writing your code, and Copilot is just making suggestions.<p>US constitution secures limited copyright to &quot;To promote the progress of science and useful arts&quot;. Copilot is just that, get over it!
评论 #31853166 未加载
评论 #31853393 未加载
评论 #31853676 未加载
评论 #31853731 未加载
评论 #31853881 未加载
评论 #31853717 未加载
pen2lalmost 3 years ago
Bit of a stretch to fashion AI-derived&#x2F;AI-coauthored works as other people&#x27;s work. Are DALL-E portraits done Picasso-style unrightfully selling Picasso&#x27;s works? Is an individual selling portraits done Picasso-style unrightfully selling Picasso&#x27;s works?<p>No, of course not. Joyce&#x27;s literature was influenced by Ibsen, Mozart looked up to Haydn, Newton was humble enough that he openly professed he stood on the shoulders of his predecessors, Perelman refused the Millennium prize because it wasn&#x27;t also offered to his colleague Hamilton.<p>All human innovation is iterative, and derivative. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=jcvd5JZkUXY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=jcvd5JZkUXY</a><p>Our skill doesn&#x27;t grow in vacuums, without outside mentorship and guidance. There are areas where I am upset about the application of AI, but this is not one of them. Consider copilot a gentle guiding hand for those without access to a second pair of eyes nearby to give you reminders on what you may otherwise have on the tip of your tongue.<p>But in the way that Led Zeppelin refused to recognize how <i>heavily</i> their music was influenced by delta blues artist was unbecoming, I can accept the argument that it is perhaps douchey of Github to sit on Copilot as squarely their creation.
shireboyalmost 3 years ago
I do feel these arguments are valid if a little overstated. Most devs have googled, found some code, and pasted it in without thinking about attribution. Doesn’t make it right, but it is a question of how much code is being copied and how specific. For example, I peruse open repos to learn - I learned about the spread operator in JavaScript that way- doesn’t mean every time I use it I need to attribute whatever repo I saw it in. But, yeah, if I copied a larger chunk and the owner wants attribution, probably wrong.<p>I like the idea of having the bot automatically update a attribution file if it detects it’s used licensed code. Seems like it would be fairly trivial. Also a robots.txt for repo owners to control automated use.<p>Also, they should totally pay back a portion of revenue to the community and support the repos used to train. That seems like it would be a good PR move if nothing else.
评论 #31847732 未加载
评论 #31848453 未加载
albertzeyeralmost 3 years ago
So, how often does it actually happen? Does it happen more often than for a human? Does anyone actually have numbers on this?<p>Of course, if you provide already a copyrighted prefix, and it has seen that code, the chances are high that it would complete the copyrighted code (because that is what you actually would also expect).<p>So, for real use cases in the wild, where you write some own real novel code, how often would it suggest some copyrighted code? And how often would a human?<p>I have used Copilot the last months and I have never ever seen such a case (I can be pretty sure because all the identifier names are really unique, and the code was very custom).<p>However, I assume that I myself might have produced copyrighted code unknowingly because if you write common patterns (e.g. some tree or graph search, or some sort function, implement LSTM or Transformer, whatever), the chances are not so low.
JacobiXalmost 3 years ago
It’s the same problem with those ML models, the other day someone generated a children’s book using GPT3, turned out that there is a real children&#x27;s book with the same name and a very similar content: The Very Lonely Firefly by Eric Carle.
评论 #31846798 未加载
评论 #31854073 未加载
Cianticalmost 3 years ago
I&#x27;m bit mixed on this, code Copilot usually autocompletes me is not particularly novel, it&#x27;s just mundane stuff I would write anyway. Most of these snippets are not copyrightable in my opinion, because it was obvious in the first place. Like CSS nth-child odd &#x2F; even logic, or one case it filled me ~10 lines JS logic of filtering rows by category stored in dataset, which I would have written anyway.<p>Then there are cases where it amazes me completely, it wrote 10 lines of C++ code for rendering a monochrome glyphs with bits using Freetype library. It though had odd subtle bug, the glyphs came reversed and it worked with only certain font size which it seemed to pick up from different file all together.
parhamnalmost 3 years ago
Pretty soon the world is going to come to realize art&#x2F;creation is just blending, incrementing and repurposing prior art.<p>No book, painting, codebase, sonnet, design is theft-less.<p>The art is the space reduction, otherwise we’d just bruteforce away.
评论 #31847049 未加载
评论 #31847251 未加载
评论 #31847192 未加载
评论 #31846922 未加载
评论 #31909987 未加载
评论 #31847141 未加载
评论 #31847379 未加载
评论 #31847719 未加载
评论 #31846996 未加载
spupealmost 3 years ago
If you assigned a task to a junior dev, and he&#x2F;she used some code from open source projects and Stack Overflow to develop a custom program for the task, would you say that this person is selling you other people&#x27;s code? Is it common or expected for this type of use to be acknowledged?
评论 #31846776 未加载
评论 #31846939 未加载
评论 #31846781 未加载
评论 #31846802 未加载
评论 #31846777 未加载
评论 #31846911 未加载
评论 #31846828 未加载
评论 #31846858 未加载
captainblandalmost 3 years ago
If we&#x27;re all standing on the shoulders of giants (specifically code that other people wrote) then really what Copilot is selling is a ladder to get onto those shoulders faster. I think that&#x27;s a legitimate aim, as such. However it should be careful about not including unlicensed code and should have a specific &#x27;GPL&#x27; option for a model trained with GPL code included.<p>I suppose it should also generate appropriate copyright notices to satisfy many open licenses. I&#x27;d be surprised if copilot could actually link back to the original code like that, though.
noisy_boyalmost 3 years ago
Say, I want to write a getter method like below:<p><pre><code> String getName() { return name; } </code></pre> Let us also assume that this snippet, unsurprisingly, has been in several copyrighted repos that didn&#x27;t grant Github the right to share this code.<p>So I start tying &quot;getName&quot; and copilot suggests the exact snippet above. If I use this snippet, is it plagiarism? Even though the above code is the most &quot;obvious&quot; way to write this getter and I would have written it this way even without copilot&#x27;s suggestion? Or does the &quot;uniqueness&quot; or &quot;non-trivial quantity&quot; of the suggestions have any bearing in determining copyright violation? How&#x2F;where do we draw the line?
评论 #31852644 未加载
评论 #31852548 未加载
mojubaalmost 3 years ago
Can I suggest a hypothesis that if you find Copilot useful it means the problem you are solving is a boring one? I might be wrong of course.
评论 #31847033 未加载
评论 #31847298 未加载
评论 #31846848 未加载
评论 #31847323 未加载
评论 #31847013 未加载
评论 #31846853 未加载
评论 #31846839 未加载
评论 #31846846 未加载
评论 #31847626 未加载
habiburalmost 3 years ago
We stand on the shoulders of giants. That had been the way for decades. A newer stack over the older one without much thought. And someone in the future will build even a newer stack over the current ones.
评论 #31847850 未加载
dgb23almost 3 years ago
Is it smart enough to:<p>- respect attribution<p>- respect copyleft<p>- respect proprietary licences<p>- give the user appropriate hints about the above<p>Or does it just copy code without doing any of this?
评论 #31846677 未加载
bborudalmost 3 years ago
My personal reasons for <i>not</i> using copilot are a bit simpler. I believe the act of researching which solutions to use for a given problem is not so much about time, or the code you end up with, but about developing a better understanding of what you are doing. You may end up just cutting, pasting and modifying a piece of code you found, but hopefully, you were exposed to a few different ways to accomplish the same thing, and it made you aware of other choices that could have been made.<p>You could think of the evolution of practical problem solving in software engineering like this:<p>1. I have to invent a solution (because nobody else in the world has a computer) 2. I have to know of a solution (education, word of mouth...) 3. I have to look up a solution in the books I have (commoditized knowledge) 4. I can look up solutions on the internet &lt;-- (we are here) 5. The computer suggests something and I accept (some are here too)<p>From 1 to 4 the amount of cleverness required to solve small problems drops a bit, but your productivity and exposure to knowledge probably goes up.<p>I&#x27;m not quite sure what happens from 4 to 5. Personally I&#x27;m actually more interested in the context solutions are presented in than just the solution. In fact, I rarely copy and paste code from the Internet, but I often look at multiple suggestions&#x2F;solutions and then borrow ideas or combine ideas from several sources.
评论 #31847839 未加载
评论 #31847986 未加载
评论 #31848562 未加载
tremonalmost 3 years ago
I might start considering Copilot if Microsoft were to train it on their own internal codebases (Windows, Office, SQL Server). Until they do, it&#x27;s clearly a &quot;tool for thee but not for me&quot; type of situation.
评论 #31851931 未加载
HumanReadablealmost 3 years ago
Sorry for the unproductive tone of this comment, but there&#x27;s something about the attitude of this tweet that really grinds my gears.<p>Any time someone invents something new and incredible, there&#x27;s always a crowd of negative nancies eager to discredit and explain why the invention is nothing new and a detrement to society.<p>I don&#x27;t understand why someone would willingly share their code on github where it is publicly available just to complain when others make use of that knowledge.<p>&#x27;co-pilot just sells code other people wrote&#x27; is such a ridiculous understatement of what co-pilot does. Instead of marvelling at the human ingenuity that went into creating it, they sneer at the audacity of openAI to do something without first asking their permission.
评论 #31847306 未加载
评论 #31847485 未加载
评论 #31847359 未加载
评论 #31847311 未加载
评论 #31847440 未加载
评论 #31847331 未加载
评论 #31847421 未加载
评论 #31847887 未加载
评论 #31847434 未加载
评论 #31848227 未加载
评论 #31847418 未加载
评论 #31847386 未加载
评论 #31848660 未加载
评论 #31847414 未加载
评论 #31847347 未加载
评论 #31847373 未加载
评论 #31847559 未加载
评论 #31847720 未加载
评论 #31847678 未加载
评论 #31847995 未加载
评论 #31848092 未加载
评论 #31847772 未加载
评论 #31848408 未加载
评论 #31847930 未加载
评论 #31848017 未加载
评论 #31850179 未加载
评论 #31847339 未加载
评论 #31848123 未加载
评论 #31847714 未加载
评论 #31847932 未加载
评论 #31848119 未加载
评论 #31847870 未加载
评论 #31847730 未加载
评论 #31847544 未加载
评论 #31847833 未加载
评论 #31847525 未加载
评论 #31847751 未加载
GuB-42almost 3 years ago
&gt; Copilot just sells code other people wrote<p>So what? Selling code other people wrote is the foundation of the free software movement. It is the entire business model of countless companies, and it is a good thing. Among them are most major linux distro vendors like Red Hat and Canonical.<p>The value added by Copilot is that they sell you the lines &quot;code other people wrote&quot; you want out of billions.<p>I still think it is derivative work, and that they should only process code under permissive licenses, or, if they want to include GPL code, make a GPL-only version, usable only for GPL projects. I thought it is what they did, there is so much code under permissive licenses that is should be enough to train their model, but apparently, they don&#x27;t care, as long as it is public, it is included. For me, they are shooting themselves in the foot, several companies have already banned Copilot due to the potential issues with copyright.
floor_almost 3 years ago
I started self hosting when Microsoft bought github and with this mass theft of copyrighted material and then reselling it for money I&#x27;m even more happy with my decision.
ricticalmost 3 years ago
Copilot very rarely copies code verbatum, and when it does it&#x27;s very short snippets. When Oracle sued Google over allegedly copying short and fairly trivial snippets of code they were justly derided.<p>I can&#x27;t speak to the legal side, but I just don&#x27;t understand the moral outrage over very occasionally copying such short snippets of code. The key innovations and the actual value that licenses are intended to protect aren&#x27;t in these short snippets.<p>And what does copilot bring to the community? Free use by students, free use by open source maintainers, and a huge boost in productivity for a modest fee for professional devs, for a service that no doubt costs a lot to run, even on the margin.
k__almost 3 years ago
Isn&#x27;t that what Web2 is all about?<p>Someone creates content for free, and companies monetize it.
评论 #31847032 未加载
bmachoalmost 3 years ago
On a side note, I do believe that short programs or functions should be copyright free by law.<p>Or we as a community need to create a better bsd, a cc0 for everything.<p>Almost everything is nontrivial, and almost everything is copyrighted, at least with the pressure to name the original author (BSD, GPL, other major permissive licenses).<p>Say you want to use a library, then you check for examples in the documentation, now you have to denote somewhere that the example is from the documentation (best if you put it in the source code, so you don&#x27;t lure other people to copy what you copied and refer you as the author).<p>It is a major PITA at least for me.
评论 #31847831 未加载
tpoacheralmost 3 years ago
Does this mean I can steal stuff if I say I trained an AI to do it for me?
评论 #31846876 未加载
wolframhempelalmost 3 years ago
When my last company got acquired, part of the due diligence process was a scan of our codebase for snippets from stack overflow. Every snippet found that wasn&#x27;t posted with a clear license by the author was challenged and we rewrote it.<p>Now, I&#x27;m not entirely sure how necessary this was from a legal perspective. But introducing an AI into the mix will bring up a lot of uncertainty when it comes to how much change is required for something to no longer be considered a copy&#x2F;derivative.
评论 #31847137 未加载
评论 #31847256 未加载
评论 #31847312 未加载
nathiasalmost 3 years ago
Copilot is a new way for corporations to break copyright while enforcing it for everyone else, this will be the first big use for AI when other corpos follow.
yaseeralmost 3 years ago
Technically, programmers search, copy and modify code all the time.<p>One might argue copilot puts into software an algorithm that humans are already doing. Software like that is usually inevitable.<p>Still, it sucks there&#x27;s no benefit for the contributors.<p>The most ethical thing I can think of is some kinda &#x27;Spotify-like&#x27; revenue sharing model, based on how often their code is used by others. Not that they&#x27;d ever implement that if they can get away with it!
评论 #31847431 未加载
评论 #31847433 未加载
评论 #31847259 未加载
Havocalmost 3 years ago
Yes, though in a way so does stackoverflow &amp; friends. Large chunk of dev ecosystem is copy paste and I don&#x27;t think this is inherently problematic. It is always a case of standing on the shoulders of giants.<p>Its more of a licensing issue to me. As far as I can tell it was train on a blend of licenses which to me makes it inherently non-compliant. At least some of it is going to be copyleft and find its way into closed source.
0x_rsalmost 3 years ago
I&#x27;m not a lawyer, nor very well versed in the vast world of licenses and their definitions in court contexts, but I&#x27;ve been wondering about something with the growing appeal ML-generated content has for the average person (and the &quot;high&quot; barrier for entry in the market) — are licenses in some form or another going to adapt to this phenomenon? From a brief search, I have not found any new license with a no-dataset-usage clause (assuming fair use does not apply, that&#x27;s another big question). What are the chances anything of the sort will become an option for any &quot;creative&quot; work that&#x27;s usually shared freely (such as artwork, code, et cetera) even despite copyright? What about the ownership of the dataset? It seemed to be questionable years ago already that possibly IP-protected content goes through the black box and resembling material gets on the other side, whose ownership is it really? I&#x27;m guessing some notable court cases in the future could define this in the following years if the popularity continues growing.
thewoolleymanalmost 3 years ago
Artificial Intelligence is causing us to revisit the difference between free as in beer and free as in speech (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Gratis_versus_libre" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Gratis_versus_libre</a>).<p>It is putting a new spin on some traditional Open Source Lessons (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_Cathedral_and_the_Bazaar#Lessons_for_creating_good_open_source_software" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_Cathedral_and_the_Bazaar#L...</a>).<p>People share and reuse snippets of unattributed snippets of MIT-licensed and GPL-licensed code on the internet all the time, StackOverflow, etc.<p>StackOverflow is profiting from that activity indirectly by facilitating it. They profit passively through ad revenue, and actively through the Teams subscription offering.<p>But nobody seem too upset about that.<p>How is an AI which facilitates the same code sharing fundamentally any different? Because it’s scraping it itself, rather than humans contributing it?<p>Seems like a tenuous argument at best.
mullikinealmost 3 years ago
Traditional &#x27;real&#x27; (as opposed to &#x27;imaginary&#x27;) programming is like writing in assembly code; It&#x27;s outmoded because of generative models, in a way similar to &#x27;C&#x27; outmoding assembly code. The most important thing, I think, is that free (libre) software developers are able to work with the language models directly, so that libre software is allowed to continue progressing into what I call Imaginary Programming. That&#x27;s because with a generative internet all you really need is blockchain + prompting.<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mullikine&#x2F;ilambda" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mullikine&#x2F;ilambda</a><p>Language models are able to &#x27;steal&#x27; the linguistic meaning-making &#x27;essence&#x27; of the software, by modelling:<p>- How the software is used (mimicing its function) - external meaning<p>- How functions are &#x27;inspired&#x27; - internal meaning (reflection)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;semiosis&#x2F;imaginary-programming-thesis" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;semiosis&#x2F;imaginary-programming-thesis</a><p>The models themselves should be clear about where the data came from. However, this is only possible in a fair world which we do not live in. Compromise must be made to protect national interests.<p>Generative models are license blind and there&#x27;s very little that could be done to prevent progress. Like what the invention of the camera has done for art.<p>Large language models including Codex are a transformative technology.<p>Bi-directional fair-use is probably the best result we can hope for.<p>So long as Microsoft and OpenAI are not selling back usage of the model to the open-source community, I think it&#x27;s OK, though it&#x27;s the bare minimum obligation.
iptqalmost 3 years ago
I know this isn&#x27;t really related to the whole copying ethics debate, but I definitely feel like there&#x27;s some sort of foul play happening here. For all of the unlicensed projects out there, the license that is automatically granted to Github includes:<p>&gt; the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time<p>It&#x27;s insane how vague this is. Is Copilot a &quot;Service&quot;? Sure, by its definition:<p>&gt; The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews.<p>And since much of the code was published before Copilot&#x27;s inception, this means Github can just arbitrarily add more &quot;services&quot; and milk the code for whatever it wants. Automatically service-ify any public repository? Sure, pay us for quotas. It&#x27;s like a legal loophole to let Github just bypass any license restrictions you put on it.
ThereIsNoWorryalmost 3 years ago
1. You most likely agreed to that by using GitHub.<p>2. Copy&amp;Pasting Code by manual search exists.<p>3. This is just a smart tool so you don&#x27;t have to figure out yourself what to copy&amp;paste (in the best case) and save a lot of time.<p>Sometimes I truly wonder how people can genuinely be upset about things like this. What is broken are copyright and patent laws in the 21st century.
评论 #31847217 未加载
评论 #31847402 未加载
评论 #31847189 未加载
评论 #31848842 未加载
评论 #31847167 未加载
评论 #31847336 未加载
aetherspawnalmost 3 years ago
Copilot is a fancy pattern bot.<p>Humans make original patterns, but since Copilot cannot think, then Copilot does not. It squashes together a bunch of small individual patterns, each under their own license, but at no stage does it do anything more than pick a line from here, and a line from there.<p>It doesn’t think, and it doesn’t create new IP.<p>It is like making a picture out of small snippets of a thousand other pictures, and then selling it.. clearly not OK. You still ripped off the original artists.<p>Or like plagiarising 100 of your class mates’ assignments. Are you less guilty because you went to the effort to steal just a few sentences from each?<p>A criminal who steals a cent from every account at the bank is a more sophisticated thief than someone who holds up a petrol servo.<p>If Copilot doesn’t create new IP (it doesn’t; we established this), then it uses existing IP. And in that case it is no different to any of the three analogies above.
maxbainesalmost 3 years ago
Initially not thought about co-pilot and other ai generators this way, but now I have I’m finding it hard to ignore.
jarenmfalmost 3 years ago
I guess the question is where you draw the line between a derivative work and &quot;learnt by an AI algorithm&quot;
评论 #31846664 未加载
评论 #31847029 未加载
rosmax_1337almost 3 years ago
I think this problem has no good solution until IP laws around the world are properly reimagined from the ground up. I&#x27;m of the quite radical stance that code, music, art in terms of their intellectual existence should be free for anyone to take. (you can own a harddrive with code on it, and claim noone should steal it, but not the idea of the code itself)<p>If you have ideas, code, music or art which you wish for noone to partake in, do your best to keep them secret. Certainly, breaking into secret areas should be illegal, but once the cat gets out of that bag it gets out of the bag.<p>The creative people behind these ideas I believe will be able to find good compensation nonetheless in society, IP-laws nowadays only serve to protect megacorporations to the detriment of creativity and ideas.
评论 #31855787 未加载
madroxalmost 3 years ago
I don&#x27;t think any professional community is aligned on how to think about ML-generated content yet. We don&#x27;t know how to apportion rights between the data owner, the model owner, and the end user, and I don&#x27;t think existing copyright law is ready for it. At least for software, I think the way forward is for the next generation of software licenses to explicitly state whether the code can be used to train ML models and what those models can be used for. Without explicit language, we&#x27;ll be squabbling over interpretations of fair use.<p>There&#x27;s going to be some big cases here. It&#x27;s going to end up in the Supreme Court sooner or later, and if it were to go there today I think I know what they&#x27;d say.
tsujpalmost 3 years ago
Copilot produces verbatim GPL&#x27;d code. It&#x27;s also a closed box.<p>Source: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;mitsuhiko&#x2F;status&#x2F;1410886329924194309" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;mitsuhiko&#x2F;status&#x2F;1410886329924194309</a>
ewalk153almost 3 years ago
If the portion of code that Copilot lifts is the &quot;heart&quot; of the original work, that would be much less likely to be considered fair use[1], regardless of the length.<p>&gt; For example, it would probably not be a fair use to copy the opening guitar riff and the words “I can’t get no satisfaction” from the song “Satisfaction.”<p>I wonder how this could be integrated into the system?<p>[1] <a href="https:&#x2F;&#x2F;fairuse.stanford.edu&#x2F;overview&#x2F;fair-use&#x2F;four-factors&#x2F;#the_amount_and_substantiality_of_the_portion_taken" rel="nofollow">https:&#x2F;&#x2F;fairuse.stanford.edu&#x2F;overview&#x2F;fair-use&#x2F;four-factors&#x2F;...</a>
pornelalmost 3 years ago
Tough pill to swallow. Microsoft&#x27;s actions don&#x27;t seem fair, but fighting them with copyright could weaken <i>fair use</i>:<p><a href="https:&#x2F;&#x2F;felixreda.eu&#x2F;2021&#x2F;07&#x2F;github-copilot-is-not-infringing-your-copyright&#x2F;" rel="nofollow">https:&#x2F;&#x2F;felixreda.eu&#x2F;2021&#x2F;07&#x2F;github-copilot-is-not-infringin...</a><p>There&#x27;s a good argument that demanding copyright protections on scraped datasets and short snippets is a double-edged sword. It could harm search engines, distribution of news, and non-commercial ML research too.
stakkuralmost 3 years ago
At every turn, in every instance, for decades, all stories involving Microsoft end in &quot;...and then Microsoft fucked people over.&quot; I&#x27;ve witnessed this firsthand since the 80s.
williamcottonalmost 3 years ago
Should the snippets that Copilot is regurgitating be considered for copyright in the first place?<p>It seems akin to trying to copyright a certain drum pattern or chord progression.<p>Also, the history of the GPL, MIT, commercializing lisp machines, Symbolic, infighting, etc… seems a very different context than Copilot so I am having difficulty seeing the systemic problems that tools like this encourage.<p>There is of course a surface level similarity in that a corporation is profiting from IP in the public domain but the devil is in the details.
sirsinsalotalmost 3 years ago
Jaron Lanier&#x27;s book &quot;Who Owns the Future?&quot; Is all about AI and compensating those that input in training these very valuable models.<p>I highly recommend everyone read it.
janosdebugsalmost 3 years ago
It&#x27;d be nice to see some proof here. Copyright is not absolute and does not extend, for example, to things that have no creativity in them. There are only so many ways to write a for loop or an if condition. Training an ML model from a large body of code IMHO violates copyright no more than any of us reading code and learning from it, as long as GH Copilot doesn&#x27;t spit out code that&#x27;s exactly the same as something already existing.
评论 #31858344 未加载
seydoralmost 3 years ago
Programmers are fine when their creations, pretty much all of tech, resells content that other people wrote for free, but no, not code, that one must be expensive
评论 #31846637 未加载
评论 #31846980 未加载
评论 #31846648 未加载
BiteCode_devalmost 3 years ago
It is incredible to use though. I pasted the return value of an API call in comment, then started to write a schema class. Codepilot just created the entire class for me. wanted to extract a subset of the data, I typed get_&lt;_name_of_the_subset&gt;(), it wrote the code I would have written.<p>So even without using someone else code, just the pattern understanding and the production of simple boiler plate code is great.
powerapplealmost 3 years ago
Why is it a bad thing? You either have people spending time reading code and learn every little thing and produce the same work in days, or have Copilot saves human life time for hours. Coding would be more efficient, it is a win-win for everyone in this industry, right? I know people attach to the code they write, but we all learn from books, and the result is common enough.
评论 #31865655 未加载
Aeolunalmost 3 years ago
&gt; what github &#x2F; microsoft is counting on here is that open source developers do not have enough collective power to do anything to stop this<p>I think it much more likely that they count on everyone liking it way too much to give a shit about their MIT code not being attributed correctly.<p>I certainly don’t. MIT just seems like the most convenient license for people that need licenses (corporations?), so that is what I use.
vbezhenaralmost 3 years ago
I somewhat agree with that. Yesterday I edited some exotic configuration (Kubernetes CSI driver for Cinder) and Copilot suggested me config which looked like someone&#x27;s config. There were no values, so it was good at filtering them out, but it definitely looked like cleaned part of code which resides in some project.<p>I don&#x27;t think that&#x27;s bad though. Code sharing is good for overall productivity.
c01nalmost 3 years ago
MS and Github are thieves, all their code is closed source, yet they sell copyrighted code they don&#x27;t own. If they told us years ago that our code will be automatically stolen by an &quot;AI&quot;, most coders would not have created an account. The innovation here is that they have access to most of the worlds open source code and automated the stealing.
capablewebalmost 3 years ago
If GitHub could guarantee that the code Copilot had ingested was only made with OSS licenses, then I don&#x27;t see what the problem is.<p>But as far as I understand, GitHub trained Copilot on any public repository on GitHub, meaning even if it doesn&#x27;t have a license specified (so the user publishing it still has the copyright to it), then I don&#x27;t see how it can be OK.
评论 #31846682 未加载
评论 #31846652 未加载
评论 #31846669 未加载
评论 #31846627 未加载
评论 #31847338 未加载
eline43almost 3 years ago
There needs to be an update to either licenses or GitHub (and other) software directly, or even software terms of services, that gives the user an opportunity to opt-out of their data being used to train proprietary AI models.<p>&#x27;I don&#x27;t agree with having an AI trained on&#x2F;with my data.&#x27;<p>IMHO, all other problems with copilot stem from this.
zokieralmost 3 years ago
Sure, the concern is valid but I feel like this tweet adds absolutely no substance to the discussion and just repeats the same opinion that was already rehashed to death since copilot originally launched. As such, especially with the tone that the tweet has, I don&#x27;t expect constructive discussion to raise here.
dgb23almost 3 years ago
Reading many of the comments here I feel like one important thing is being left out that is not related to legal, but to social issues:<p>Who is on the side of open source? Where are the big, powerful institutions and companies that deeply care about authors and communities providing free software that so many of us rely on?
andrelaszloalmost 3 years ago
There are a few reasons why this could be considered ethical. First, open-source code is typically free to use, so the company would not be taking advantage of anyone by using it to train their AI. Second, the company would be providing a service that people are willing to pay for, so they would be generating value for society. Third, the company would be transparent about what they are doing and would not be hiding anything from the public.<p>...the above was generated by GPT-3 (text-davinci-002). Prompt: Write an argument for why using open-source code to train an AI and then sell the code generating service (without open-sourcing it) is ethical.<p>The main argument against this is that it takes away from the open-source community that contributed to the development of the code in the first place. By selling a code-generating service without open-sourcing it, the company is profiting from the work of others without contributing back. This is unfair and takes away from the overall open-source ecosystem.<p>Added two characters to the prompt :P
pabs3almost 3 years ago
I wonder if FOSS folks could copyleft originally public&#x2F;leaked but proprietary code using CoPilot.
nlalmost 3 years ago
This isn&#x27;t how a language model works.<p>It&#x27;s SO frustrating that even on HN people still fall for this naive and incorrect analysis. Pasting bits I&#x27;ve said before on this topic:<p>Language models do not work like this. They can copy content but usually that&#x27;s for something like the GPL language text.<p>Generally they work on a character by character basis predicting what is the most likely character to appear next.<p>This very rarely results in copying text, and almost never rare text.<p>Mechanically it has learnt both syntax of language and how concepts relate. So when it starts generating it makes sentence that are syntactically valid but also make sense in terms of concepts.<p>That&#x27;s really different to just combining bits of sentences, and it gives rise to abilities you wouldn&#x27;t expect in something just cutting and pasting bits of sentences. For example, few shot learning is mostly driven by its conceptual understanding and can&#x27;t be done by something with no way to relate concepts.
评论 #31847411 未加载
评论 #31847612 未加载
olalondealmost 3 years ago
I&#x27;m going to make a bold prediction: no one will ever lose a copyright lawsuit due to usage of Github Copilot generated code. The code snippets it produces are too small or trivial to qualify for copyright infringement.
评论 #31849173 未加载
stefanos82almost 3 years ago
Seems like my original questions [1] are more relevant than ever!<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27677598" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27677598</a>
tiborsaasalmost 3 years ago
MrDoob has an excellent point about this:<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;mrdoob&#x2F;status&#x2F;1539740854956412929" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;mrdoob&#x2F;status&#x2F;1539740854956412929</a>
lfrigodesouzaalmost 3 years ago
It&#x27;s as the saying go, &quot;when a product is free to use, the real product is actually you&quot;. In this case, our code is the product. Just considering now on swapping to another git provider...
oytisalmost 3 years ago
Copilot sells the service of finding the code that makes sense for what you write. Would be better if it could correctly attribute the source(s) though, I hope they will solve this problem at some point.
thih9almost 3 years ago
Is github copilot using private repositories for the learning process?<p>If yes, how do they mitigate the risk of exposing private data when something is quoted verbatim?<p>If not, then why are repos with non permissive licenses ok?
sirsinsalotalmost 3 years ago
Beware geeks with gifts. This is Microsoft. The question isn&#x27;t &quot;is it good?&quot; but &quot;Why are Microsoft offering it and how is it undermining everyone else?&quot;
评论 #31854095 未加载
mawadevalmost 3 years ago
What stops me from re-uploading copyrighted source, where I remove the notices and push it with an MIT license? If such a data set has been trained with, how do you get it out?
LeonThereminalmost 3 years ago
And social media sells ideas other people thought.<p>Copilot is limited to public code now, but it may easily be trained on non-public code - albeit this probably won&#x27;t be for sale to the public.
FeepingCreaturealmost 3 years ago
All I can think of is Steve Yegge [1]: &quot;They have no right to do this. Open source does <i>not</i> mean the source is somehow &#x27;open&#x27;.&quot;<p>My code is on Github so that people can read it, reuse it and learn from it. &quot;The freedom to study how the program works&quot;, as the FSF says. If some of the people reading it are machines, why would that matter?<p>[1] <a href="http:&#x2F;&#x2F;steve-yegge.blogspot.com&#x2F;2010&#x2F;07&#x2F;wikileaks-to-leak-5000-open-source-java.html" rel="nofollow">http:&#x2F;&#x2F;steve-yegge.blogspot.com&#x2F;2010&#x2F;07&#x2F;wikileaks-to-leak-50...</a>
评论 #31847112 未加载
iLoveOncallalmost 3 years ago
Github Copilot is selling code other people wrote as much as the author of this thread is profiting from words other people invented.<p>Absolute nonsense.
评论 #31847375 未加载
presentationalmost 3 years ago
Google just sells content other people wrote.
AtNightWeCodealmost 3 years ago
Copiliot will be that bandmate that plays a new riff and leave you wondering about where it was borrowed from.
acuozzoalmost 3 years ago
This is, in part, why I will continue to use the original 4-clause BSD license for the code I write.
blitz_skullalmost 3 years ago
Man, people really do be angry that the public code they put on a public platform is being used publicly.<p>Wild.
评论 #31858398 未加载
boomer_joealmost 3 years ago
We need a licence that forbids use in ML and the people willing to sue github for it ASAP.
评论 #31848396 未加载
shahar2kalmost 3 years ago
and Dalle2 sells art other people created<p>(I&#x27;m actually not being sarcastic, I think there needs to be some sort of pipeline for compensating the artists who are used to train these models
fimdomeioalmost 3 years ago
what AI is showing is the fuzzy line between creating and copying. The truth is they are both always present in everything we do, we&#x27;ve just been trying to hide it.<p>So it should be as simple as if you&#x27;re using other people&#x27;s content for your own profit you should properly compensate them.<p>Or we could just abolish copyright law and assume that everything humans create emanates from culture so its always collectively built and everything should be open source.<p>Or we just do the same we&#x27;ve been doing. Create even more complex laws trying to define this fuzzy line in a way that companies can keep profiting from it a lot more than individuals.
marstallalmost 3 years ago
most of the code I write is glue sticking together 8 proprietary systems nobody&#x27;s ever heard of. how is copilot gonna help me with that?
tikualmost 3 years ago
I&#x27;m using it for a day now and i&#x27;m really impressed. It is so aware of stuff in old code, that it is scary. I&#x27;m working in an old application with Zend Framework.
whywhywhywhyalmost 3 years ago
Same deal for Dall-e if they ever productize it.
pvaldesalmost 3 years ago
Each day sounding more as Zopilote, it seems.
sytelusalmost 3 years ago
Google just sells content other people wrote.
SMAAARTalmost 3 years ago
Once again Innovation challenges IP.
HeavyStormalmost 3 years ago
So much bullshit my head hurts.
lysecretalmost 3 years ago
Don&#x27;t we all.
honkleralmost 3 years ago
license issues will save many thousand jobs.
ameliusalmost 3 years ago
&quot;Good artists copy. Great artists steal.&quot;<p>:)
abdulhaqalmost 3 years ago
That&#x27;s like saying a plumber just sells parts that other people made
评论 #31847015 未加载
评论 #31848590 未加载
janandonlyalmost 3 years ago
Isn&#x27;t every programmer in history (except the gall who invents her own language and writes all her own code) simply an archeologist for other people&#x27;s work?<p>We all Duck&#x2F;Google for code anyway. Why not admit and make it easier?
评论 #31847575 未加载
评论 #31847786 未加载
danamitalmost 3 years ago
The code Copilot suggest from any given project most of the time is not enough to credit such project, when I look up code in some GitHub repo, and copy it fully or part of it, I do not credit that project.<p>I do not see Copilot as useful anyway.
Separoalmost 3 years ago
GitHub provides the repo hosting and tools for free on public projects. I&#x27;m happy with this deal.
评论 #31846934 未加载
spupealmost 3 years ago
I disagree. Copilot is selling content-aware code suggestions, which is a result of code that other people wrote in their platform, and which in no way affects the work of these people.
lakomenalmost 3 years ago
I don&#x27;t understand what&#x27;s going on there.<p>I don&#x27;t use github. Can someone explain what the author means?<p>Edit: in detail
评论 #31847200 未加载
评论 #31846860 未加载
评论 #31846979 未加载
skcalmost 3 years ago
I get the feeling this entire debate would have been non-existent had this been a Jetbrains product instead.<p>The whole thing is just bizarre when the vast majority of developers constantly look at OSS code daily and lift ideas&#x2F;patterns&#x2F;snippets from there regularly without once looking at whatever license is attached.
评论 #31846734 未加载
评论 #31846720 未加载
评论 #31846771 未加载
bborudalmost 3 years ago
Well, this does invite an interesting comparison. If we imagine something like Copilot applied to music I believe the chances of ending up in court would be pretty high. There are a lot of examples of plagiarism lawsuits in popular music and the outcome seems to be entirely random.<p>One could argue that the information density in chord progressions, bass lines and beats is extremely small. And that any recognizable part of a musical idea that has been &quot;borrowed&quot; would necessarily make up a larger percentage of the complete work than would be the case for a typical application with borrowed snippets.<p>That&#x27;s not a bad argument, but it is unsatisfactory because it means that at some point someone has to make a judgement on how much you can borrow.