TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GitHub Copi­lot inves­ti­ga­tion

1847 点作者 john-doe超过 2 年前

178 条评论

schoen超过 2 年前
Here are a few thoughts I haven&#x27;t formulated before:<p>It seems clear enough to me that training AIs on copyrighted works is typically or commonly a fair use under existing law, because the AIs can and commonly do learn non-copyrightable elements and aspects of those works. It&#x27;s very obvious from enormous numbers of examples that current AI systems are capable of learning much more abstract features of human culture (grammar, concepts, facts, cultural tropes, and many others).<p>A human being doesn&#x27;t violate copyright in learning from a copyrighted work, including when that human being is later more able to produce other works based on that learning (e.g. reading fantasy novels and learning concepts, tropes, or vocabulary that one uses to produce other fantasy novels; reading a newspaper and learning facts that one incorporates into an essay; learning artistic techniques or stylistic conventions from studying existing artworks and using them when producing new artworks). Current AI systems are (amazingly) becoming capable of all of these things and may do them in ways that are somewhat akin to how human beings do them. (although I guess Jaron Lanier would object &quot;that&#x27;s what they want you to think&quot;)<p><i>But</i> there are also examples in existing copyright doctrine where people accidentally repeat enough of a prior work to get in trouble for infringement -- most often with song composition (like George Harrison&#x27;s &quot;My Sweet Lord&quot;) because relatively small pieces of melody (which a person might easily memorize) may be considered copyrightable.<p>If human beings had much more accurate memories, copyright would be quite a bit more intrusive (and&#x2F;or quite a bit less effective) because, following any exposure to some kinds of works, we could use our own memories to reproduce those entire works from scratch for our own use or pleasure without obtaining authorized copies from elsewhere.<p>Computers do have such accurate memories, and machine learning systems, which are optimized for things like maximum likelihood estimation, can and do reproduce <i>both</i> copyrightable and non-copyrightable elements of works that they&#x27;ve been trained on. After all, the maximum likelihood continuation of a fragment of a text or a song is ... the complete original work. And the ability to reproduce the complete original work would, other things being equal, reduce loss in training. After all, that&#x27;s something someone might specifically ask for, and if the system could oblige, it would be doing a better job of providing what the user wanted.<p>It&#x27;s relatively foreseeable that machine learning systems would potentially be able to reproduce both copyrightable and non-copyrightable elements of various works, because the distinction between the two isn&#x27;t especially clear from an algorithmic or mechanical point of view. (For instance, facts aren&#x27;t copyrightable, but the notion of what constitutes a &quot;fact&quot; for this purpose is a culturally-bound legal notion and not at all straightforward to make precise.)<p>But if you had a human author or artist or scholar or programmer who was &quot;trained on&quot; exposure to an enormous body of works, <i>and</i> that person had an exceptional eidetic memory, you could imagine that he or she <i>would</i> be perfectly capable of recreating many of those works from memory (and that other people might request such recreations). (Again, in music in particular, it&#x27;s already routine that someone could have unambiguously copyrightable material memorized and be subject to copyright restrictions on performing songs. Like if a singer or band performs a cover from memory.)<p>If you wanted to avoid this ability then you might need to build in an explicit notion of copyright that limits the accuracy or level of detail inside of the model in some way. This is tricky because (1) I don&#x27;t think people have really tried to do this much so far, (2) copyright applies very differently to different categories of work, (3) it obviously wouldn&#x27;t satisfy critics even if it mitigated the most extreme examples of &quot;regurgitation&quot;, and (4) it would be kind of weird because you would be intentionally limiting the quality and extent of learning that the system was allowed to do. (I imagine Jaron Lanier getting mad again about my repeated comparison between human learning and machine learning, and between human memory and machine memory)<p>Some of the weirdness in point (4) is that accurate prediction is <i>usually</i> cool &#x2F; great &#x2F; impressive &#x2F; accepted as an appropriate goal or capability, but if it&#x27;s <i>too</i> accurate in certain contexts, it may be deemed a copyright infringement. Like if you said &quot;what word comes next? FOUR SCORE AND SEVEN YEARS AGO OUR FATHERS&quot;, there&#x27;s a clear correct answer <i>and knowing it requires having a certain text memorized</i>. OK, if you said &quot;what word comes next? MR. AND MRS. DURSLEY OF NUMBER FOUR PRIVET DRIVE WERE PROUD TO SAY&quot; ... same thing, but Bloomsbury Publishing may be unhappy if you have a system that can get all such questions right.
评论 #33243208 未加载
评论 #33242747 未加载
评论 #33243468 未加载
评论 #33242823 未加载
评论 #33242853 未加载
评论 #33244142 未加载
评论 #33242952 未加载
评论 #33244536 未加载
评论 #33243004 未加载
评论 #33244289 未加载
评论 #33243132 未加载
评论 #33246955 未加载
评论 #33246577 未加载
评论 #33249429 未加载
评论 #33242884 未加载
评论 #33242719 未加载
评论 #33245147 未加载
评论 #33242742 未加载
评论 #33250506 未加载
评论 #33245831 未加载
评论 #33244332 未加载
评论 #33244293 未加载
评论 #33243489 未加载
woah超过 2 年前
It would be sad if someone succeeded in shutting down CoPilot for this kind of copyright stuff. It is genuinely useful. I don&#x27;t care that it reproduces copyrighted content. The only way you can get it to do that is to bait it with the function names of functions that have already been copy and pasted thousands of times onto GitHub without proper licenses.<p>Luckily, someone will probably come out with a &quot;renegade&quot; version trained on whatever makes it a useful assistant to my coding. I won&#x27;t be afraid of accidently violating copyright myself, because I won&#x27;t be trying to bait it into reproducing heavily copy&amp;pasted cherrypicked examples, and I won&#x27;t use 20 lines of its output with zero modification.
评论 #33242490 未加载
评论 #33243704 未加载
评论 #33242548 未加载
评论 #33243102 未加载
评论 #33242464 未加载
评论 #33242581 未加载
评论 #33242843 未加载
评论 #33243138 未加载
评论 #33242430 未加载
评论 #33243170 未加载
评论 #33243221 未加载
评论 #33246956 未加载
评论 #33244246 未加载
评论 #33246810 未加载
评论 #33242551 未加载
mkr-hn超过 2 年前
A sizable, possibly plurality cohort of fully adult tech people is young enough to not know about United States v. Microsoft Corp. This would explain a lot of comments I see on this topic.<p>If you don&#x27;t know Microsoft&#x27;s history, a lot of what more informed people are worried about seems overblown. Copilot was Microsoft&#x27;s first test of people&#x27;s trust after the GitHub acquisition. It&#x27;s going very, very, very poorly. There were ways to do this with consent and collaboration with the people and projects it takes code from, but they&#x27;re acting like classic Microsoft here.<p>Too many people are focused on what&#x27;s legal. It&#x27;s fine to think of, but law is the last stop before the breakdown of society. Microsoft skipped society and went straight to sparking an inevitable test of and possible reshaping of copyright law.
评论 #33242886 未加载
评论 #33246050 未加载
评论 #33242842 未加载
评论 #33245485 未加载
评论 #33242822 未加载
armchairhacker超过 2 年前
One issue I see with Copilot is that they get free access to all open-source data on GitHub, but using GitHub APIs to download the data yourself isn&#x27;t possible (rate limiting). This is an unfair advantage. Copilot is not only making money off of open-source, they are making money off of open-source in a way others can&#x27;t.<p>I would love to see a lawsuit which requires GitHub to provide their full Copilot dataset.
评论 #33240848 未加载
评论 #33241082 未加载
评论 #33242365 未加载
评论 #33241098 未加载
评论 #33240870 未加载
评论 #33240759 未加载
评论 #33241215 未加载
评论 #33241715 未加载
评论 #33241586 未加载
评论 #33242008 未加载
meesles超过 2 年前
I&#x27;m in favor of this. You can&#x27;t ingest code that says &quot;you cannot use this without attribution&quot;, put it through a bunch of if statements that strip the license, and then say it&#x27;s &quot;AI-generated&quot;. I don&#x27;t care about most of our generic CRUD apps or the 15th rewrite of a sorting algorithm, but I do care about those smart enough to advance the field and come up with novel solutions. If we take away the incentive for attribution and recognition, people won&#x27;t be as willing to share and we&#x27;ll all be worse for it.<p>Like someone else said, there was a version of this where they asked people to opt-in and got community involvement. In true MS fashion, they just did it without asking and people are rightfully pissed.
评论 #33245428 未加载
评论 #33246576 未加载
评论 #33253932 未加载
bo1024超过 2 年前
There are two issues -- (1) feeding copyrighted material <i>in</i> to an AI model, and (2) getting copyrighted material <i>out</i>.<p>The latter is obviously a violation of copyright, full stop.<p>The former, to me, is obviously not a violation. If it were, that would massively tilt the playing field in <i>favor</i> of large corporations. It would become very hard to independently train your own models. Philosophically, I go by the principle that if it&#x27;s (il)legal to do yourself, then it should be (il)legal to do the same thing with an AI&#x27;s assistance.<p>The massive complicating factor is that nobody knows how to do (1) without also doing (2) as a side effect, because we don&#x27;t understand how deep learning works well enough to control it.
评论 #33241509 未加载
评论 #33242418 未加载
评论 #33242364 未加载
评论 #33242373 未加载
评论 #33244682 未加载
评论 #33241608 未加载
评论 #33242562 未加载
rickydroll超过 2 年前
RSI took away my ability to write any significant amount of code 30 yrs ago. co-pilot plus speech recognition restored that ability. what impressed me most was that from a textual description,co-pilot gave me code that could have been written by my mind and pre-injury hands.<p>from the comments here, if I push copilot into giving me code that I would have written for a given problem and that code violates licenses, then who is responsible for the copyright violation? co-pilot for giving me code that looks like copyrighted code or me for tweaking co-pilot commands to give me the code I envisioned which looks like copyrighted code?<p>also consider that the very tools used for solving problems in code lead coders to a small number of solutions for a given problem. is it plagiarism or parallel original thought?<p>also consider that when I wrote code, if I was solving a similar problem to what I solved before, I recreated that previously used code fragment (or larger) and use it to solve the problem at hand. I had zero issues leaving a trail of duplicate code behind me especially if the code was a major part of a software patent.<p>I didn&#x27;t care, my code was lauded for it&#x27;s readability and reliability. reuse the same concepts in multiple variations, you get real good and writing code correctly.<p>maybe co-pilot like programs could scan existing code bases and find examples of code fragment plagiarism with the goal of showing that software copyrights are useless.
评论 #33244260 未加载
bigiain超过 2 年前
This is a bit off-topic, but I wonder if there are people&#x2F;teams right now creating git repos, doing the source code equivalent of &quot;SEO&quot; on it, and embedding backdoors in stupidly overoptimized for the training process code?<p>I wonder when we&#x27;ll hear about the first big hack that gets traced back to production code pushed live after CoPilot &quot;suggested&quot; eval(base64decode({webshell}))
评论 #33240686 未加载
评论 #33240738 未加载
评论 #33240669 未加载
19h超过 2 年前
I&#x27;d be rather saddened if Copilot was shut down or neutered because of a few vocal few protesting against it.<p>It&#x27;s been a massive productivity improvement to our senior devs, and I got so used to it that it&#x27;s an annoyance when Copilot doesn&#x27;t respond.
评论 #33240757 未加载
评论 #33240779 未加载
评论 #33241209 未加载
评论 #33241315 未加载
评论 #33244245 未加载
评论 #33242851 未加载
评论 #33240966 未加载
评论 #33241778 未加载
评论 #33241434 未加载
评论 #33246166 未加载
elfatizer超过 2 年前
There are lots of comments arguing for or against Copilot on a value judgment, and having an opinion on it being ethical or legal, etc isn&#x27;t going to be the same for everyone. But I think regardless of where you stand, there should be some sort of legal ruling to clarify the gray areas that Butterick breaks down.
评论 #33240856 未加载
评论 #33241104 未加载
评论 #33252810 未加载
评论 #33240876 未加载
cool-RR超过 2 年前
While the moral and legal discussions here are interesting and worth exploring, I find this text hyperbolic. Its premise is that the main way that people currently interact with open-source projects is by digging into their source code, copy-pasting away a snippet of code that solves a particular problem, and then of course giving the authors the required attribution.<p>This is far from the truth. The main usage of most open-source projects isn&#x27;t as code, but as a product. The median user of an open-source project wants to think about the project as little as possible. They want to be as unaware as possible of the code that makes up the project. They&#x27;re happy to add the project to their `requirements.txt`, add a few lines to import and use it and then never think about it again.
评论 #33244281 未加载
评论 #33244921 未加载
评论 #33244305 未加载
评论 #33245305 未加载
skytrue超过 2 年前
It&#x27;s always interesting to see the buzz that occurs when Copilot is brought up as a topic. This place is called &quot;HackerNews&quot;, yet routinely people forget that a &quot;hacker&quot; is somebody using technology to overcome novel problems. Doesn&#x27;t GitHub Copilot fall into this category? Why is there such an outcry over a technology that has been in the public&#x27;s hands for less than a year? I&#x27;m almost certain that the team responsible for Copilot is going to try to figure out how to avoid spitting out code verbatim, as that&#x27;s obviously not a good look.<p>It&#x27;s most likely the case that in 1, 3, 5 years, Copilot won&#x27;t be spitting out code blocks verbatim. It will generate rightsize code, trained on lots of publicly available code, and start reducing the surface area required to code&#x2F;develop.<p>Stable Diffusion doesn&#x27;t get in trouble right now because the artwork looks like permutations of different works; text is easy to copyright, style is more challenging, but artists are facing up against the same reality. There&#x27;s no rolling this back; ML models are going to remove a ton of cruft from creative&#x2F;labor based endeavors, and people are going to need to evolve to stay relevant.
评论 #33242271 未加载
评论 #33242769 未加载
评论 #33244321 未加载
评论 #33241857 未加载
评论 #33242679 未加载
评论 #33242285 未加载
评论 #33242988 未加载
rockemsockem超过 2 年前
What do people think the future looks like where publicly available resources on the Internet (art, code, etc) aren&#x27;t fair use for training ML models? Where you have to opt into models or can opt out (and many wind up doing so)?<p>OpenAI, Microsoft, Google, et al will STILL train such models that can do all the same things, but it will be much harder for non-industry-backed individuals to navigate the legal minefield where you must ensure you properly attribute your model outputs, only train on opt-in data, etc, etc. Surely no one really thinks that a court case against Microsoft&#x2F;OpenAI (even if they lose) would stop CoPilot?<p>Most of these complaints seem to be extremely emotional and cherry-picked. &quot;People&#x27;s legal rights are being violated!&quot; (you definitely don&#x27;t know that, no one knows that, the article is 100% right about that), &quot;look I prompted CoPilot for this piece of code that I already knew about and it spit it right out&quot; (that&#x27;s not how it&#x27;s going to be used in practice).<p>It seems to me that the longer-term implications of the outcome of a lawsuit like this are far more interesting, yet almost all the comments I see are nitpicking and whining about how the world isn&#x27;t the way they want it to be. I wish the conversations around generative AI could be...just better.
评论 #33241087 未加载
评论 #33241107 未加载
评论 #33241930 未加载
评论 #33241221 未加载
评论 #33241810 未加载
评论 #33241062 未加载
评论 #33241034 未加载
评论 #33241004 未加载
评论 #33241795 未加载
评论 #33241151 未加载
评论 #33241130 未加载
评论 #33241718 未加载
评论 #33241212 未加载
评论 #33242931 未加载
评论 #33241228 未加载
评论 #33241467 未加载
评论 #33241112 未加载
评论 #33241148 未加载
评论 #33243066 未加载
评论 #33241910 未加载
评论 #33241659 未加载
评论 #33241637 未加载
评论 #33241496 未加载
评论 #33241392 未加载
评论 #33242074 未加载
invig超过 2 年前
What&#x27;s with the default to &quot;if it&#x27;s not explicitly legal, it must be illegal&quot;?<p>Imagine if every new piece of software your wrote had to be tested for legality because you don&#x27;t know that it&#x27;s explicitly legal. Oh there aren&#x27;t laws for this new thing, so I guess you should challenge yourself all the way to the supreme court?<p>I get the author not liking Copilot, but I don&#x27;t see that GitHub&#x2F;Microsoft have any kind of obligation to figure this out just because they&#x27;re GitHub&#x2F;Microsoft.<p>If I as an individual had this obligation placed upon me I&#x27;d just never write any more code.<p>Ultimately I think, like open source, Copilot and the tools that will follow advance human progress in novel ways. Software getting easier to make is a good thing. If you don&#x27;t like this particular implementation of something helpful, feel free to start an open source alternative without challenging yourself in the supreme court.
评论 #33241252 未加载
评论 #33241310 未加载
评论 #33241533 未加载
评论 #33246779 未加载
评论 #33241238 未加载
评论 #33246140 未加载
评论 #33242044 未加载
kmeisthax超过 2 年前
If Copilot itself is infringing then so is GPT-3, DALL-E 2, NovelAI, and Stable Diffusion. There&#x27;s no legal argument that would <i>solely</i> target one application of this technology, and you can&#x27;t build generative AI using current ML tools without relying on a very large corpus of public data. All AI is built on free-riding[0].<p>While there is no US case law that explicitly says &quot;training AI is fair use&quot;, the Second Circuit says that scanning books to make a search engine for them <i>is</i>. And the absolute worst interpretation of AI is that it&#x27;s just a very well-compressed search engine index for its training set data[1]. So I&#x27;m not entirely sure if we can even thread the needle to <i>only</i> ban Copilot or AI training as a whole without also creating harmful precedent for search engines. Actual judges may try, I&#x27;m not sure if they&#x27;ll succeed.<p>Internationally, the EU already legalized training AI on copyrighted works[2]. So if we do win against Copilot in court, all we&#x27;ve really done is shift AI research over to the EU where laws are already more favorable.<p>I fully agree that Microsoft is shoving too much liability onto their users, though. And this, again, also applies to all generative AI. My personal opinion with generative AI is that it&#x27;s a nice curio, but not anywhere close to &quot;production-ready&quot;, and Microsoft and OpenAI are trying to sell us on a lie that it&#x27;s better than it really is.<p>[0] This also implies that all y&#x27;all playing around with image generators are just as much of a freeloader as Microsoft is.<p>[1] This viewpoint is also called &quot;compressionism&quot;.<p>[2] This was part of the most recent EU Copyright Directive update - the one that added a <i>de facto</i> upload filtering requirement. It also added a copyright exception for museums and historical preservation.
评论 #33242863 未加载
评论 #33246055 未加载
评论 #33243266 未加载
评论 #33244695 未加载
评论 #33246807 未加载
ClassAndBurn超过 2 年前
My view is the copilot is not stealing open source code. It is learning from it just as a human reader would. People&#x27;s disguste is based on the assimilation of what they thought was a human trait being machine derived from their work.<p>The copilot service backed by an army of actual humans wouldn’t be a story at all. Nor would anyone be angry, if an individual offered coding skills as a service, and had gone through the exercise of learning great amount to open source software to do so.<p>No open source license was written with this in mind. Because previously learning was something only humans could do and no one had issue with sharing that knowledge. Until licenses take machine learning use into account I see no problems with Copilot.<p>Source cannot be open if you restrict any viewing of it.
评论 #33240954 未加载
评论 #33241024 未加载
评论 #33240817 未加载
评论 #33240997 未加载
评论 #33240973 未加载
nmilo超过 2 年前
Good bye and good riddance. Even just the idea that GitHub should be allowed to train their proprietary AI on other people&#x27;s work is insane. Much less distribute that AI in a paid package which lets you spit out other people&#x27;s code verbatim. Anyone who supports open-source and the (ab)use of copyright law to create free works should be vehemently opposed to Copilot.
评论 #33242835 未加载
评论 #33242768 未加载
rwalle超过 2 年前
I don&#x27;t understand why GitHub decided to run the project this way. This is a great idea but they messed the whole thing up. They could have make it opt-in from the very beginning and ask people to waive their rights, and I&#x27;m sure lots of people and lots of big projects would still be interested in joining the initiative. They could reward participants with, say, 3 year of Copilot access after it is officially launched, and people would love that. But instead they just take code without asking or attribution and keep pushing it, and now we are in this situation.
boomskats超过 2 年前
Everything else aside, the design on this site is among the best I&#x27;ve ever seen. Amazing typography, great to read on a phone.
评论 #33240593 未加载
评论 #33240604 未加载
评论 #33242570 未加载
评论 #33244788 未加载
评论 #33244473 未加载
评论 #33240879 未加载
评论 #33240558 未加载
samhuk超过 2 年前
Although I&#x27;m aware that this tool is a boon to many, particularly those with impediments like RSI, I still have to echo what a number of other comments say: There really is a <i>very large</i> proportion of adult software developers in the market who are simply too young to have lived through the EEE Microsoft era. Add on to that the proportion of old-enough Microsoft-brand <i>&quot;dotnetter&quot;</i> software developers who simply don&#x27;t care as long as they get to sit comfortably within C#, Visual Studio and Azure.<p>After that, what are you left with? A small enough proportion of developers, and Microsoft evidently thinks so, who don&#x27;t know, and&#x2F;or don&#x27;t care, and&#x2F;or don&#x27;t have the time to fight their Extend-Embrace phase of take-over of Github.<p>One could argue that the purchase of Github was <i>Extend</i>, and their involvement with OpenAI, the Codex, and the potentially illegal use of OSS (subject to the legal investigations) is <i>Embrace</i>.<p>It&#x27;s my own personal view that Microsoft held-back the progress of software development by probably a decade or so with their shady commingling with academia, blatant crippling of C# .NET to sell Visual Studio, and endlessly so forth. So I am, along with many, upset to see a business like this EEE their way into OSS, something which is dear and special to so many.<p>In the end, and I must state in my own opinion (since there is an element of speculation here), I am just pleased that there are still people out there who are not letting Microsoft continue their old ways.
评论 #33247674 未加载
bloppe超过 2 年前
I think it&#x27;s important to realize the exact implications here:<p>- MS absolutely has the authority to copy, use, and even train their models on your GPL-license code, because you agreed to let them do that when you signed their EULA when you decided to host your code on GitHub.<p>- This authority does not extend to CoPilot users, who cannot republish your GPL-licensed code without respecting the license. But remember that people have always had the ability (not authority) to copy and use open source code in violation of the license. This simply makes it embarrassingly easy for a person to do so unknowingly (although, legally, this would probably be considered negligence, not ignorance).<p>IANAL but I wonder if the extreme facilitation of copyright infringement here could be considered gross negligence on the part of MS, as they&#x27;re almost entrapping their own customers in a minefield of copyright concerns. Can&#x27;t wait to find out.<p>The logical next step in this arms race is for the GPL camp to build tools to automatically search for copyright infringement in large codebases. Copyright holders could set up hotlines for insiders to blow the whistle on infringement in exchange for compensation, since AFAICT all litigation precedent in the US has so far resulted in settlement.
评论 #33242082 未加载
评论 #33242083 未加载
评论 #33242138 未加载
评论 #33242088 未加载
评论 #33242090 未加载
cmrdporcupine超过 2 年前
Unfortunately the tone I&#x27;m getting from many of these comments makes me feel that people see open source projects as a resource to be mined rather than as a product to be respected. A very entitled attitude <i>(&quot;I really don&#x27;t want to lose my lovely tool&quot;)</i><p>There seems to be -- on the whole -- little respect for the spirit of the GPL and LGPL and it really is quite a change from, say, 20 years ago, when the &#x27;free software&#x27; movement was I think more ascendant.<p>I think we have a generation of software developers who have only known a world where copious quantities of high quality source code has been made available to them under very liberal licenses -- which they in turn make careers and companies out of using &#x2F; exploiting.<p>I, too, do this, and I generally open my modest projects under Apache or MIT or Mozilla style licenses. I do this because I want people to use my things, or to be able to use them as resume &#x2F; portfolio material. Or because my employer at the time has helped fund construction of them.<p>But I <i>also</i> occasionally use the GPL&#x2F;LGPL&#x2F;AGPL, when I want to explicitly avoid corporate entities from exploiting said material without either consulting with me or in turn making their efforts free.<p>And in turn, I respect the value and power of the GPL for that purpose.<p>So many of the comments here are trivializing the value of free software and the licenses which make it possible, and acting like there&#x27;s just this... natural right... to go out there and build on other people&#x27;s work without recognition &#x2F; compensation &#x2F; contribution.<p>There are <i>too many</i> examples of CoPilot violating the spirit -- if not the actual legal letter -- of the GPL. This is unacceptable. I&#x27;m glad that someone is attempting a legal test.<p><i>Free software is not your data to mine.</i> It is the blood sweat and tears of thousands of developers who do their work in community spirit, but under explicitly free software principles.<p>Putting something out under a free software copyleft-style license is not the same as saying <i>&quot;You can do with this what you want.&quot;</i> It&#x27;s &quot;I made this, you can build on it, but what you made <i>also has to be free</i>. Or you negotiate with me.&quot;<p>And what I&#x27;m getting from the whole CoPilot fiasco is: GPL &#x2F; free software does not belong on GitHub. And it might end up having to be put, generally, behind barriers that explicitly (technically and legally) prevent CoPilot &amp; similar systems from getting access to it.<p>EDIT: I also fully expect a new version of the GPL to be published that includes clauses against this kind of datamining.
评论 #33247390 未加载
评论 #33247091 未加载
a254613e超过 2 年前
&gt;But how will you feel if Copi­lot erases your open-source com­mu­nity?<p>Jesus Christ, dramatic much? Are people that stumble upon a piece of code while googling how to do something, and end up copying and pasting the code from the repo, really building the open source community? Because that&#x27;s essentially what it is. Whether I use copilot to generate a tedious function, or I copy it from your open source repo I&#x27;m on the same level of being a member of your open source community.<p>This whole thing feels like artists screaming how AI generated art is horrible, trying to figure out how to sabotage it, or how to start lawsuits - just because their value went down just a bit. Same thing with developers.
评论 #33246193 未加载
评论 #33246266 未加载
ipaddr超过 2 年前
Doesn&#x27;t really explain how co-pilot is stealing your community. I&#x27;ve used co-pilot and it works great until you are past boilerplate than it falls apart.
评论 #33240580 未加载
评论 #33240467 未加载
评论 #33240722 未加载
评论 #33240917 未加载
评论 #33240656 未加载
评论 #33241068 未加载
评论 #33241450 未加载
评论 #33241663 未加载
cercatrova超过 2 年前
Does GitHub not have the right to view and train from your content when you agree to their Terms of Service and upload your code?<p>People are conflating their open source license with the one they give GitHub when making a GitHub account, but they are two entirely separate and parallel licenses. The former is for other people to use your code, the latter is for GitHub to host your code.<p>If you don&#x27;t like it, you are free to host your code on your own servers.<p>And anyway, as noted the other day about AI, it is often funny to see people not care about (or even enjoy) AI in other fields that they don&#x27;t work in, but when it comes for their own field, they are suddenly very worried. See programmers on HN who argue for Stable Diffusion but against Copilot, and vice versa with artists on Twitter. As I commented then, it&#x27;s an act of cowardice to think our own profession should be immune from AI while we enjoy the fruits of AI in other fields [0]:<p><i>&gt; Yes, many of us will turn into cowards when automation starts to touch our work, but that would not prove this sentiment incorrect - only that we&#x27;re cowards.</i><p><i>&gt;&gt; Dude. What the hell kind of anti-life philosophy are you subscribing to that calls &quot;being unhappy about people trying to automate an entire field of human behavior&quot; being a &quot;coward&quot;. Geez.</i><p><i>&gt;&gt;&gt; Because automation is generally good, but making an exemption for specific cases of automation that personally inconvenience you is rooted is cowardice&#x2F;selfishness. Similar to NIMBYism.</i><p>We <i>should</i> want AI. That we then try to use outdated models like copyright to enforce holding back human progress is a true shame. In my view, <i>so what</i> if GitHub uses people&#x27;s code for training data, we are all getting a better product because of that.<p>[0] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=33226515#33228948" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=33226515#33228948</a>
评论 #33240524 未加载
评论 #33240546 未加载
评论 #33240588 未加载
评论 #33240501 未加载
评论 #33240626 未加载
评论 #33240509 未加载
评论 #33240478 未加载
评论 #33244484 未加载
评论 #33240677 未加载
ok123456超过 2 年前
Maybe I&#x27;m in the minority, but I think the prospect of someone autocompleting and getting a snippet that came from me, they found it useful, and are going to incorporate it is great. It means my thoughts and logic are shaping culture in a mimetic feedback loop.
评论 #33241276 未加载
gjsman-1000超过 2 年前
&quot;[W]e inquired pri­vately with Fried­man and other Microsoft and GitHub rep­re­sen­ta­tives in June 2021, ask­ing for solid legal ref­er­ences for GitHub’s pub­lic legal posi­tions … They pro­vided none.&quot;<p>Well... DUH. Why would they? You want to possibly sue them. Why in the hell would they, <i>or anyone</i>, provide crucial evidence for your lawsuit before you&#x27;ve sued them, regardless of the case and circumstances? Of course they aren&#x27;t going to provide evidence, because you are obviously going to then try to prove hypocrisy, whereas you might not have enough to go on if they don&#x27;t talk. No corporate lawyer in their right mind would ever grant such a request. (Edit: You are quite literally asking what their legal strategy is going to be, before the lawsuit has occurred, and then trying to spin the refusal as a proof of guilt.)<p>That&#x27;s like claiming that an alleged drug dealer who didn&#x27;t talk without a lawyer present is obviously a criminal, because if he wasn&#x27;t he would have talked. What a nothing of a point.
PAMANOCH超过 2 年前
The problems have almost nothing to do with deep learning stuff. They are on the companies who develop such products.<p>If a company use someone&#x27;s code for a commercial product (a normal app), they do need to follow the license accordingly. If a company use someone&#x27;s code for a commercial product (model training), they don&#x27;t need to follow anything.<p>If a company use someone&#x27;s art piece for a commercial product (a normal game), they do need to get consent, and pay for the right to use to the hosting platform or artists themselves if it is not royalty free. If a company use someone&#x27;s art piece for a commercial product (model training), they don&#x27;t need to get consent or pay for anything.<p>All the problems actually happen before the technical details, making the entire pipeline questionable.
jasone超过 2 年前
I really don&#x27;t care if my code gets ingested and regurgitated by Copilot, but it seems rather a stretch to imagine that this is fair use, in part because it separates me from the legal protections afforded by the licenses I released my software under. In my ideal world, Copilot would be legally viable, and releasing my software without restriction wouldn&#x27;t be risky.<p>As a long-time open source software developer, I have favored the 2-clause BSD and MIT licenses because they are the simplest licenses that provide me some liability protection. I would release code into the public domain if that didn&#x27;t increase the likelihood of being sued, whether for liability, or for someone else claiming intellectual rights to code I actually wrote.
评论 #33244579 未加载
mgraczyk超过 2 年前
All this discussion of legality is interesting to me, because I&#x27;m pretty sure that if Github ran a search in the background, found the corresponding license for the code snippet, then showed it to the user in some cookie-banner like annoyance, it would be completely legal. This is what Github already does on their website with a search bar.<p>Yet somehow I think most people upset about Copilot would not like that outcome.
评论 #33243193 未加载
评论 #33242445 未加载
评论 #33242443 未加载
Barrin92超过 2 年前
100% correct takes in the piece, this is just ridiculous<p>&gt;<i>&quot;Tim Davis gave numer­ous exam­ples of large chunks of his code being copied ver­ba­tim by Copi­lot, includ­ing when he prompted Copi­lot with the com­ment &#x2F;</i> sparse matrix trans­pose in the style of Tim Davis <i>&#x2F;.&quot;</i><p>Copilot regurgitates code and blatantly violates licenses, not even sure what there is to argue about. Not only does it seem straight up illegal and sideline open source communities, I think the next logical step of this is that people who want to avoid having their work vacuumed up and their rights violated simply to move to proprietary software, which would be a huge disaster for open source.
endisneigh超过 2 年前
Copilot is trained on and returns AGPL code verbatim. It’s game over. If these licenses are not enforced it defeats the entire purpose.
评论 #33242789 未加载
评论 #33242574 未加载
评论 #33241976 未加载
gamekathu超过 2 年前
A bit of a controversial opinion: to those who are defending CoPilot saying it &quot;boosted my productivity&quot; and would miss it if it is discontinued, maybe you are not a productive developer to begin with. I fail to see how searching the same snippets on Google or saving commonly used macros in your favorite editor would not yield the same amount of productivity. I have used CoPilot for several months and I actively stopped using it, because I was afraid I will be dependent on it, and it would actually reduce my ability to do critical code-building. I&#x27;m happy without it - sure it takes some micro seconds more to type out my code instead of autogenerating it, but I feel much more self confident in my own coding skills.<p>CoPilot is a great research work - it is indeed spectacular to see how pre-training can achieve such impressive code completion results. However, in my honest opinion, it should not be a tool for a serious developer.
DannyBee超过 2 年前
To set people&#x27;s expectations, it is likely to take a bunch of lawsuits and a bunch of cases here to get to anywhere useful. The problem with lawsuits on copyright is that they are rarely precedential. I get that what people see is the large cases that try to tackle big topics. But for every single one of those, there are probably 10x or 100x equally large case that did precisely none of that.<p>This is particularly true of fair use, it is very fact specific. A court is much more likely to answer a very fact specific question about copilot, tied to the very specific facts of the case (IE how is this exact thing used&#x2F;etc) than more broad, abstract questions.<p>In fact, standard Article III courts in the US are literally <i>not allowed</i> to issue advisory opinions.
orsenthil超过 2 年前
Oh man. I want to continue using co-pilot. It has improved my productivity and made me excited to do things that I previously felt like a chore.<p>Also, programmers please do not hinder on other programmers work. If you do, someone higher up in the ladder with eat your cake at every opportunity.
评论 #33241732 未加载
mdswanson超过 2 年前
I&#x27;ve been trained on open source code, and there are likely many algorithms that I&#x27;ve internalized that are very similar to the &quot;standard&quot; way of performing an operation.<p>Is there a reason why an AI being trained on the same open source code isn&#x27;t a similar situation? I agree that wholesale pasting of code chunks is an issue, but that hasn&#x27;t been my experience with Copilot.<p>I&#x27;m not arguing for Copilot here...I&#x27;m genuinely curious why this would be considered any different.
评论 #33240771 未加载
评论 #33240768 未加载
tomphoolery超过 2 年前
Does reading software code count as &quot;using software&quot;? I personally don&#x27;t consider myself subject to a license when I&#x27;m reading public code on GitHub. GitHub Copilot and Codex AI seem to be doing nothing more than reading a bunch of source code, not reusing that code to incorporate its functionality into a different product.
评论 #33247551 未加载
nilshauk超过 2 年前
So happy to learn of this and I wish them best of luck in their efforts. And I&#x27;m surprised to find so many people klinging to Copilot.<p>We shouldn&#x27;t shed any tears for a megacorporation which shows such blatant disregard for the licensed works of people&#x27;s labour.<p>Yes, AI is here to stay but we should be able to build AI that respects copyright. Yes, it&#x27;s easier to just steal data and call it fair use. Whether or not that&#x27;s stealing will be interesting to try in court.
评论 #33245483 未加载
bluenose69超过 2 年前
Two things.<p>First, it would be nice to have a copilot variant that searched <i>only</i> my own work, so I wouldn&#x27;t need to grep through other code I&#x27;ve written to get a reminder of how I solved a problem in the past.<p>And, speaking of the past ...<p>Second, I am old enough to have seen slide rules being replaced by calculators. This was a great addition to the toolbox, but it also had its downside: I&#x27;ve seen many students who have very clouded notions of significant digits, and many more who get quite confused with where to put a decimal point, when I ask them to compute something simple by hand.<p>Similarly, coding has been transformed with the advent of stack-like systems. There are two communities of coders now: those who learn a language and then can solve problems based on a solid foundation, and those who shorten the learning phase and code by web-search. The latter, it seems, are in danger of creating code that is brittle, limited, or downright wrong.<p>To the extent that copilot amplifies this habit of searching instead of thinking, I think it may lead to unreliable code.<p>So, sure, there are copyright issues. I think they have been well-discussed here and elsewhere. And courts may weigh in with new ideas. But my concern is with the reduction in code quality that may ensue. I&#x27;d love to see a discussion of the groups that are using copilot. If they are working on something I don&#x27;t care about, then this is just a copyright issue. But if they are working on the &quot;smarts&quot; behind drug discovery, the control of dangerous machines, etc., then we have another issue, besides copyright.
评论 #33244957 未加载
Imnimo超过 2 年前
&gt;Copi­lot intro­duces what we might call a more self­ish inter­face to open-source soft­ware: just give me what I want! With Copi­lot, open-source users never have to know who made their soft­ware. They never have to inter­act with a com­mu­nity. They never have to con­tribute.<p>&gt;Mean­while, we open-source authors have to watch as our work is stashed in a big code library in the sky called Copi­lot. The user feed­back &amp; con­tri­bu­tions we were get­ting? Soon, all gone.<p>I don&#x27;t see how you square the above complaint with this:<p>&gt; First, the objec­tion here is not to AI-assisted cod­ing tools gen­er­ally, but to Microsoft’s spe­cific choices with Copi­lot. We can eas­ily imag­ine a ver­sion of Copi­lot that’s friend­lier to open-source devel­op­ers—for instance, where par­tic­i­pa­tion is vol­un­tary, or where coders are paid to con­tribute to the train­ing cor­pus.<p>Is an AI that was trained on opt-in or paid-for training data any less damaging? How would these choices have alleviated the problems described above?
esskay超过 2 年前
I do have to wonder if Copilot will last. It&#x27;s going to become a legal minefield and I can&#x27;t imagine for a second that Micrsoft will want to be in the crosshair for another antitrust case.
评论 #33241956 未加载
评论 #33240885 未加载
commitpizza超过 2 年前
Great, I hope it is tried in court. It should be. But unfortunately I have not a big hope that the courts will come to understand the issue well enough.
评论 #33240603 未加载
snarfy超过 2 年前
If there were no copyright problems then why didn&#x27;t Microsoft train Copilot on its own source code like windows, visual studio, sql server, etc?
评论 #33246509 未加载
Andys超过 2 年前
If you&#x27;re against Copilot as developer, you&#x27;re shooting yourself in the foot.<p>Locking up code under non-permissive licenses stymies the pace of code development and increases the costs of progress dramatically.<p>We all stand on the shoulders of others before us. Including the organisations that stand to benefit the most from aggressive licensing.
评论 #33241187 未加载
评论 #33241175 未加载
aetherspawn超过 2 年前
I wonder if people realize that letting GitHub train Copilot on their open source contributions is effectively de-valuing your own time, which (if repeated at a larger scale) devalues your experience, and that eventually has the effect of reducing the correlation between your experience and your salary.<p>For example, if an overseas firm can just as easily use Copilot as I can write original code (or use Copilot myself), why would any company hire me locally?
beefman超过 2 年前
Open source? They used everything on github with no regard for license, which would have included plenty of code under conventional copyright. Microsoft is now profiting from that code.
Uhhrrr超过 2 年前
The example given is &quot;sparse matrix trans­pose in the style of Tim Davis&quot;, but someone who wanted something with such specificity would be able to just take it from Github anyway, perhaps with a little more searching.
评论 #33240678 未加载
modernerd超过 2 年前
I love that Matthew is investigating this and agree that Copilot warrants more scrutiny. His suggestions that Microsoft let developers opt-in to having source used for training purposes, to pay for source it uses, and to attribute or credit it appropriately all seem reasonable.<p>Can someone help me to imagine a reality in which these points are viable concerns?<p>&gt; …how will you feel if Copi­lot erases your open-source com­mu­nity?<p>&gt; …Copi­lot will become not just a sub­sti­tute for open-source code on GitHub, but open-source code every­where.<p>&gt; …Copi­lot is merely a con­ve­nient alter­na­tive inter­face to a large cor­pus of open-source code.<p>&gt; With Copi­lot, open-source users never have to know who made their soft­ware. They never have to inter­act with a com­mu­nity. They never have to con­tribute.<p>Is the author suggesting that Copilot will be used in place of `npm install next react react-dom` or `cargo add tokio --features full` or `raco pkg install pollen` — that developers will be content to use augmented autosuggest in place of large, well-tested, well-documented open source libraries?<p>Does he see Copilot&#x27;s final form as some kind of AI package manager that drops a library of untested unattributed undocumented files into our projects?<p>Or is it more that he thinks those libraries won&#x27;t exist because open source contributors will grow to feel more abused than they already do, perhaps quitting the scene or developing in private, like certain artists have already done in response to the AI art movement?<p>There is already such a huge disparity between paid package consumers and unpaid package contributors. I haven&#x27;t seen that change since Copilot launched in beta or under general availability. I see the same ratio of help&#x2F;feature requests compared to code and documentation contributions that I always have. And package usage has not declined so far for the open source things I work with.<p>It would be nice to learn more about the “Copilot will lead to the death of open source communities” line of reasoning — what is the author&#x27;s perceived timeline to open source&#x27;s decline and fall as a result of Copilot&#x27;s current path?
w10-1超过 2 年前
Don&#x27;t confuse what you want with what the law says<p>&quot;Your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device&quot; [<a href="https:&#x2F;&#x2F;www.copyright.gov&#x2F;help&#x2F;faq&#x2F;faq-general.html" rel="nofollow">https:&#x2F;&#x2F;www.copyright.gov&#x2F;help&#x2F;faq&#x2F;faq-general.html</a>]<p>A copy is made whenever that text is displayed, e.g., in GitHub&#x27;s UI. Even that copy is subject to copyright.<p>Is there an excuse&#x2F;exception? In this case, there is no &quot;fair use&quot; exception, because exceptions have to be litigated case-by-case to be recognized, and there are no remotely similar situations. Don&#x27;t forget: Lexis is a multi-billion-dollar business built on protecting the copyright to the page numbers in the otherwise public court opinions.<p>Does the law actually protect people if it&#x27;s too costly to enforce? Not really; hence the blase attitude. Congress is considering a &quot;small claims&quot; system for copyright, to remedy the big-firm bias. [<a href="https:&#x2F;&#x2F;www.copyright.gov&#x2F;title17&#x2F;92appm.html" rel="nofollow">https:&#x2F;&#x2F;www.copyright.gov&#x2F;title17&#x2F;92appm.html</a>]<p>In the ML era, data is the new gold. Many, many firms nowadays get a good chunk of their revenues from selling their private view of &quot;public&quot; data: Facebook, LinkedIn, credit reporting companies, ADP, etc. Microsoft has gone all-in on stealing that gold from open-source developers.<p>It&#x27;s not just that the code replication reduces any need to get the code from the source. But removing any link to the source destroys the value most-commonly sought in open-source software: recognition.<p>Salaries are the biggest expense of tech companies. They do everything they can to increase labor competition and reduce reputational rents: outsource, cross-train, promote open-source (for competition) and destroy any reputation networks or systems that justify higher rates. And, of course, standardize on containerized copy-paste or AI-generated software if they can.<p>So, no: copilot is not legal, it&#x27;s socially and economically destabilizing, and it presents structural challenges to developers.<p>It&#x27;s not good, but most will keep using it because although the vast, vast majority of developers are wage laborers, they aspire to be founders. They see it can make code fast, and they&#x27;ll think it make them better.
评论 #33244387 未加载
wkz超过 2 年前
If they really don&#x27;t think that they need to comply with any license, then why not include all private repos in the training set? Could it be that they&#x27;re worried about legal repercussions, whereas OSS is easier to (ab)use for this purpose because there&#x27;s much less legal muscle behind it?<p>It is also very telling that they have not included any of their own proprietary code in the training set. If it&#x27;s merely suggestions that are generated, why not also train on the NT kernel? Office?
shpx超过 2 年前
It seems like there&#x27;s no good license that places absolutely no restrictions or requirements on people using your code (such as attribution and respecting patent rights) worldwide.<p>I want my code to be used the way people treated text in the old days. There&#x27;s texts that have been re-written, added to and edited by thousands of people over the centuries and yet they don&#x27;t come with thousands of pages of attribution notices because why would they?
评论 #33240709 未加载
评论 #33241265 未加载
评论 #33241741 未加载
tapia超过 2 年前
I think that Microsoft should train copilot with their own code (they own certainly enough lines of code after all). If they think that that would not be a fair use, then why should be a fair use to use somebody else&#x27;s code?
评论 #33266455 未加载
angusturner超过 2 年前
I get the impression that many peoples&#x27; grievance with generative AI (text, code, images etc.) isn&#x27;t _really_ about the data provenance. Or at least, it feels secondary, compared to the general disruptive nature of the tech.<p>If tomorrow someone released a StableDiffusion, CoPilot etc with the same functionality, but respecting the provenance of the data (i.e. licensing etc), what concrete difference would this make? Programmers and other creative professionals would still (reasonably) be nervous about the implications for their livelihoods and communities.<p>At some point it will be possible to prompt a model for music in the style of &lt;random artist&gt;, and having never heard &lt;random artist&gt;, the model will generate a convincing emulation, based purely on statistical knowledge gleaned from millions of unrelated songs and text pairs. (I give it 5 years).<p>Now what? &lt;random artist&gt; should still be concerned (or not), but at least we&#x27;re talking about the correct issue: How do we co-exist with generative models that massively disrupt&#x2F;alter the process of doing creative or intellectual work?
评论 #33248887 未加载
siliconc0w超过 2 年前
You&#x27;re missing the big picture, first you create a lot of licensing violations littered throughout internal code and next they can sell you an Azure hosted open-source licensing annotation AI to fix it.
pr337h4m超过 2 年前
It&#x27;s tragically beautiful how the copyleft crowd is putting so much effort into drastically expanding the scope of copyright.<p>&quot;I used the copyright to destroy the copyright.&quot;<p>That sort of plot never works in practice.
评论 #33240729 未加载
评论 #33241959 未加载
kazinator超过 2 年前
&gt; <i>Arguably, Microsoft is cre­at­ing a new walled gar­den that will inhibit pro­gram­mers from dis­cov­er­ing tra­di­tional open-source com­mu­ni­ties.</i><p>This is extremely far fetched.<p>User bases (let&#x27;s avoid one of the four dirty C words) are organized around something which builds, executes and is documented, not searches for snippets.
krick超过 2 年前
I don&#x27;t like that opensource code is being used in a commercial product. I feel concerned about NNs learning about stuff they aren&#x27;t really &quot;supposed to&quot; learn, because somebody published something by mistake a long time ago. But this general argument about reproducing copyrighted code is stupid, and actively trying to shut Copilot down because of that is why lawyers are cancer.<p>Basically, what Copilot (or anything like that) is supposed to do is to speed up your work, i.e., ideally, to write exactly what you&#x27;d write, but orders of magnitude faster. How do you write code? Well, you may have a solution in mind — if it&#x27;s something really original, rest assured, Copilot won&#x27;t guess it. It can only hope to guess something that, in a sense &quot;has a correct answer&quot; to it. In fact, it does it worse, than it should be: graph traversals, matrix operations and such should be guessed flawlessly (in a perfect world every PL would have some primitives implementing them in the best possible way, but ours is not perfect). If you don&#x27;t know how to traverse a graph, you&#x27;ll go and look for a reference. 15 years ago it was likely a book, then looking up on the Wikipedia or StackOverflow became way more likely. For the last 5 or so years literally searching it on GitHub became viable because of better search engines and the sheer size of it.<p>Now, if I found a matrix transpose function in an open-source project, which I cannot include as a library for some (usually technical, but maybe not) reason, so I memorize it, close the page and re-type it in my IDE, do I have to be restricted by its license? Then, doing so is obviously stupid, so how about me just copy-pasting it, while renaming some variables so that the teacher wouldn&#x27;t notice? And, given that this is not my homework, there&#x27;s no teacher and variables are named perfectly as they are — doing that is also really stupid, so I might have just copy-pasted it. So, how about now, do I have to publish my code under GPL3 now? Is this theft? If any lawyers say yes — fuck these lawyers. It is nonsense.
评论 #33246378 未加载
mark_l_watson超过 2 年前
First, the author’s book Beautiful Racket is very cool, recommended.<p>I largely disagree with this article, at least for MIT, BSD, etc. training code examples. The small autocompletions, even if they are several lines long, sort of seems like fair use to me.<p>I do think that CoPilot should have an option to use a smaller model just trained in code that has very liberal use licenses, because I think the use of GPL, etc. licensed code is problematic - at least for me.<p>For what it is worth, I have a lot of Apache 2 licensed repos on GitHub (largely examples from my books) and I am pleased if my code contributed a small bit to the CoPilot training data. I also publish my recent books under Creative Commons, allow reuse, even commercially licenses: basically anything I do that might help someone, I am all in for sharing.
评论 #33242293 未加载
评论 #33242777 未加载
not2b超过 2 年前
It seems that Copilot could address this issue by searching for matches in its source repositories for the strings it generates, with appropriate criteria, and give the user a link describing the origin of the code, who wrote it, and what the license is for cases where a match length exceeds a threshold. So, you wouldn&#x27;t just get the Quake fast integer square root routine, you&#x27;d get a pointer to the Quake repository and license info from which it came. A separate model could be trained up that would find the closest match in source code repositories. A user could then use Copilot safely, attribute code correctly, and avoid code with incompatible licenses.<p>This would be a better approach than &quot;shut it down&quot;.
bruhhh超过 2 年前
This does more harm than good. If you set a precedent, then things like stable diffusion will also be illegal since it&#x27;s trained on public data. OP just wants to make money from microsoft using fearmongering and false sense of righteousness
blackoil超过 2 年前
From all the discussions, it seems people are rooting for MPAA alike organization and ContentId like system for code.
Kiro超过 2 年前
Abolish all copyright. We&#x27;re all happily pirating movies and music but code is for some reason sacred.
评论 #33241056 未加载
评论 #33240992 未加载
cryptonector超过 2 年前
Maybe MSFT should have one instance of Copilot for each common license, and then the user gets to pick which licenses they want to deal with when using Copilot. If you&#x27;re writing code for a BSD-licensed codebase, you might accept Copilot trained on BSD- and MIT-licensed code, as well as any other license that&#x27;s compatible with BSD. If you&#x27;re writing code for a proprietary codebase you might want to exclude Copilot trained on any copyleft licenses. And so on.
评论 #33241979 未加载
naikrovek超过 2 年前
Here&#x27;s the thing about GitHub that most people do not realize. I find this funny because of all the talk of following license agreements, very few have taken the time to read the terms of service for GitHub.<p>from their terms of service: &quot;Short version: You own content you create, but you allow us certain rights to it, so that we can display and <i>share the content you post.</i>&quot; emphasis mine.<p>that&#x27;s what they call the &quot;Short version&quot; of the following paragraphs, which are found here: <a href="https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;site-policy&#x2F;github-terms&#x2F;github-terms-of-service#d-user-generated-content" rel="nofollow">https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;site-policy&#x2F;github-terms&#x2F;github-t...</a><p>they allow themselves the right to display content you upload to others. GitHub does not seem to really put a cap on that in terms of what intentions it needs to have or for what purposes it needs to share your content.<p>this seems to me that, by putting your code on github.com, that you are granting GitHub license to show it to others. period. IANAL, but it seems like all code anyone puts on github.com is dual-licensed, at least. GitHub gets their own rights to your code.<p>I read this before I signed up, and while I can&#x27;t remember if this exact passage was present at the time, I was ok with everything GitHub wanted at the time, and I continue to be.<p>githubcopilotinvestigation.com doesn&#x27;t seem to have much hope of doing anything except getting people mad. but you all were already mad anyway, weren&#x27;t ya?
评论 #33242025 未加载
minhazm超过 2 年前
There&#x27;s been a lot of discussion around licenses but I&#x27;m not even sure if they matter for Copilot. I was reading their terms and conditions and there&#x27;s a paragraph that basically says they have the right to display and share your code with other users. So even in the case where people are directly prompting Copilot with specific function names, I think the terms and conditions still cover them.<p>&gt; We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.<p><a href="https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;site-policy&#x2F;github-terms&#x2F;github-terms-of-service#4-license-grant-to-us" rel="nofollow">https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;site-policy&#x2F;github-terms&#x2F;github-t...</a>
hsbauauvhabzb超过 2 年前
How can I, as the lead of a small team, make sure none of my code ends up on copilot (or any other submission of our IP to third parties)? We use Devops internally, and IDE decision is up to the developer.<p>Im unsure if vscode etc submit samples or just interact with GitHub.<p>Edit: and furthermore, make sure it doesn’t import code from third parties. I don’t want my code being infringed upon, but also don’t want to accidentally infringe on others’ work. Legal or not.
blibble超过 2 年前
in countries where there is no fair use (most of the world outside the US) it seems quite likely copilot is willful, commercial scale copyright infringement
评论 #33240632 未加载
asdff超过 2 年前
I wish code didn&#x27;t have any copywrite at all. It should just belong to our species for the benefit of our species. If your entire business model depends on having some private code that you lord over, versus, you know, having some expertise in the field you are in and the ability to generate more code to solve ongoing problems, it seems like you are structured on shaky ground to begin with.<p>For example, there are plenty of academics these days who are at the tops of their fields and open source all their code. They end up considered as experts not because of a black box code base they implement on problems, but because they can think of potential solutions to the problems at all, and one of the tools used is writing up some code. The code is a shovel or a hammer, its not the one wielding it. They have competitors too of course, just that the secret sauce isn&#x27;t the code but what goes on in your actual brain.<p>Its too bad most business leaders fail to understand this, and think its a blackbox code base that makes a decent business. Its the ability to solve problems that matters.
typon超过 2 年前
This Copilot saga is another good reminder of why nothing is free. Developers have been using Github for free for years - now the chickens have come home to roost. The copyright licenses are just a formality - a form of kayfabe. If you aren&#x27;t hosting your own code (GNU style), you should assume Microsoft owns it, for all intents and purposes.
NicoleJO超过 2 年前
Those who want to insist there are no instances of infringement or evidence thereof should take a look at this link first.<p>It&#x27;s face-saving.<p><a href="https:&#x2F;&#x2F;justoutsourcing.blogspot.com&#x2F;2022&#x2F;03&#x2F;gpts-plagiarism-links.html?m=1" rel="nofollow">https:&#x2F;&#x2F;justoutsourcing.blogspot.com&#x2F;2022&#x2F;03&#x2F;gpts-plagiarism...</a>
qwerty456127超过 2 年前
It makes sense to copyright a book, but it doesn&#x27;t makes sense to copyright a phrase (unless you are using it as a trademark motto or something like that), normally phrases are free for anybody to re-use. It makes sense to copyright a program, but it doesn&#x27;t make sense to copyright a piece of code.
kanonade超过 2 年前
Oh my god, round and round on this topic. Leave it alone. Copilot is an amazing tool and demo of what AI can do. I will happily pay for good ML products, which a notoriously hard area to monetize.<p>Copilot may produce results from the training set, but if you&#x27;re letting it do that, that says more about you than about copilot.<p>All of these claims use the example &quot;Write me a function to foo the bar that takes baz as an argument&quot;. If you prompt it to write entire functions and classes for you, then it will lean on its training set.<p>But if you actually just write code, then it will complete small single lines in exactly the style you&#x27;ve previously written. With code that is unique to your program because it can synthesize new code.<p>In this role copilot is no different than a search engine. By prompting it lazily, copilot isn&#x27;t the one stealing the code, you are.
welder超过 2 年前
This reminds me of pirating music. Lawyers tried futilely to stop it, but if something is technically possible people will find a way to keep doing it. Maybe you set some legal precedent on fair use with AI, but it won&#x27;t prevent the real world usages if there&#x27;s a benefit to the technology.
flkiwi超过 2 年前
Lots and lots and lots and lots of people confusing copyright (an inherent property right granted and protected by the government) and license (a privately granted privilege to use). Butterick—who is no IP fundamentalist, just go look at the license he used for his typefaces—is doing two things: looking at the enforcement of open source licenses so that they are not invalidated by nonenforcement and, related, asking Microsoft to respect the community. I didn’t see him suggest that Copilot is bad or should he shut down, just that they play by the rules. A lot of the reactions here echo a lot of non-developer middle managers who insist that open source code is free and freely usable by anyone for any reason, which simply isn’t the case if FOSS licenses have meaning and value.
benced超过 2 年前
1. New player shows up, changes value chain and creates abundance 2. People who benefitted from old value chain whine 3. New player throws them a bone with a small fund or maybe a setting box, doesn’t change 4. (A few years later) no one cares about the kooks who whined<p>I’m not even 30 yet and I’ve seen this happen again and again - it’s frankly boring at this point. We’ve seen this with Spotify and music, newspapers and the internet etc.<p>The practical truth is that Copilot is a useful tool for humanity to have. It is exceedingly unlikely it will be stopped because a small percentage of programmers - themselves a small percentage of people who benefit from code - feel their interests have been hurt. Change or get left behind (but make sure to enrich some lawyers on a pointless suit in the meantime).
TheMiddleMan超过 2 年前
There&#x27;s a big difference between learning and memorizing.<p>If the AI is &quot;learning&quot; how it works by studying public code then using its knowledge to create, that&#x27;s okay.<p>But if it&#x27;s just memorizing code and reciting it back, not okay. Just like if a human were doing this.<p>Of course we don&#x27;t currently have ways to know the difference [that I know of] since AI is a black box.<p>Interestingly, current AI is not capable of truly understanding how code works and how it will execute, so it has to learn in it&#x27;s own way. I suspect it can learn what valid syntax is, but I doubt it is aware of how the code will execute.<p>It&#x27;s possible this is just a case of Overfitting. <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Overfitting" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Overfitting</a>
echelon超过 2 年前
BSD 5-Clause<p>1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.<p>2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and&#x2F;or other materials provided with the distribution.<p>3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the organization.<p>4. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.<p>5. Use of this source code for the research or training of machine learning models is permitted.
评论 #33240517 未加载
nonrandomstring超过 2 年前
Funnily, for an article all about copying (lots) everywhere the author writes Copilot it appears as &quot;Copi lot&quot; in text browsers. Also the HN title appears the same (check it in hex dump)<p>For example, from TFA file:<p>0005e10 o f C o p i 302 255 l o t
amelius超过 2 年前
Has anyone spotted licenses in the wild that specifically prohibit AI tools like Copilot?
williamcotton超过 2 年前
Copyright only covers the expressive parts and not the utilitarian parts:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Abstraction-Filtration-Comparison_test" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Abstraction-Filtration-Compari...</a><p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Idea–expression_distinction" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Idea–expression_distinction</a><p><a href="https:&#x2F;&#x2F;h2o.law.harvard.edu&#x2F;cases&#x2F;5004" rel="nofollow">https:&#x2F;&#x2F;h2o.law.harvard.edu&#x2F;cases&#x2F;5004</a><p>Most of your code is probably not subject to copyright in the first place, regardless of license.
评论 #33240537 未加载
henvic超过 2 年前
Reality: *GPL licenses are proprietary licenses.<p>I hope Copilot and similar technologies weakens the copyright establishment.<p>Do Business WITHOUT Intellectual Property - Stephen Kinsella <a href="http:&#x2F;&#x2F;www.stephankinsella.com&#x2F;wp-content&#x2F;uploads&#x2F;publications&#x2F;kinsella-do-business-without-ip-2014.pdf" rel="nofollow">http:&#x2F;&#x2F;www.stephankinsella.com&#x2F;wp-content&#x2F;uploads&#x2F;publicatio...</a><p>Against Intellectual Property - Stephen Kinsella <a href="https:&#x2F;&#x2F;mises.org&#x2F;library&#x2F;against-intellectual-property-0" rel="nofollow">https:&#x2F;&#x2F;mises.org&#x2F;library&#x2F;against-intellectual-property-0</a>
salford超过 2 年前
A good solution might be to add a new license clause stipulating whether the owner is okay with their code being used to train AI models.<p>Part of the clause would explain that if you are okay with your code being trained on, then you&#x27;re also accepting being okay with it being copied verbatim at some point down the line during code completion.<p>You do get a bit of tragedy of the commons where everybody wants to use the AI model but nobody wants their own code trained on.<p>I don&#x27;t like the idea of a world where licensing and copyright law prevents us from enjoying the progress of AI. Caveat: I am not an expert on open source.
评论 #33247936 未加载
评论 #33247873 未加载
burgerguyg超过 2 年前
There&#x27;s a &quot;Pictures in Boxes&quot; comic about the internet stealing content on his page. It doesn&#x27;t name the author or link his site on the image or in text.<p>But since the use is not for the page author to comment on the comic itself, but the comic is used to support his discussion of another misuse of IP, does it constitute fair use?<p>The page author is going deep on the content misappropriation theme and on what constitutes fair use, so it seems oddly ironic he&#x27;d be so <i>seemingly</i> cavalier about using someone else&#x27;s content on that page.
perryizgr8超过 2 年前
I am glad all the legal bs didn&#x27;t stop MS from making the product. Copilot is surprisingly effective, it truly makes life easier for me, as a developer. The fact is that if you give your code away publicly, you cannot finely control what the world does with it. If this is not acceptable to you, keep your IP private.<p>If these guys manage to shut down or cripple Copilot using legal mechanisms, you can bet there will be a Chinese&#x2F;Russian alternative that will be even more indifferent to your LICENSE.md, and you won&#x27;t be able to get it shut down using the courts.
BoppreH超过 2 年前
Most of these points can also be raised against DALL-E 2, but software has one extra thorn: patents.<p>It&#x27;s a common advice to not read software patents[1] because the infringement penalties are lower if you did so unwittingly, that is, by reinventing the patented technique yourself.<p>I wonder if using Copilot doesn&#x27;t push the penalties back again to wilful infringement. Or worse, patent trolls poisoning the training data with patented algorithms.<p>[1]: <a href="https:&#x2F;&#x2F;queue.acm.org&#x2F;detail.cfm?id=3489047" rel="nofollow">https:&#x2F;&#x2F;queue.acm.org&#x2F;detail.cfm?id=3489047</a>
评论 #33241568 未加载
naikrovek超过 2 年前
&gt; Why couldn’t Microsoft pro­duce any legal author­ity for its posi­tion?<p>Absence of proof is not proof of absence.<p>They don&#x27;t owe anyone anything beyond what they agree to provide to users of Copilot via its license agreement or to GitHub users whose code it has used in accordance with that license agreement. Those agreements define what they owe. That&#x27;s it.<p>The only way those license agreements don&#x27;t hold up in court is if they are somehow deemed invalid. I do not see Microsoft making that kind of mistake.<p>This website is designed to get people angry, and that&#x27;s all it is going to accomplish.
627467超过 2 年前
Hey, I despise bait and switch from large corps. But I also find it unsustainable this idea that societies and legal resources are wasted fighting for IPs.<p>The code is out there. Millions of people are being trained and writing code based of the learnings of open data.<p>Designers have &quot;mood boards&quot;. Developers have open source. Right now I don&#x27;t have sympathy for MS, but in a few years any you developer could just do what MS is doing with Copilot in their bedroom. Why would you care about the kid in their bed room training an AI with free (as in public) information?
评论 #33241610 未加载
quickthrower2超过 2 年前
Maybe I will start writing open source code intended to trick copilot. Stuff that just about works in the given context, but will fail badly if copypasta&#x27;d into another program. If we all did that.
chronolitus超过 2 年前
To stay sane: for myself as a developer, I consider github copilot as a (much) faster google&#x2F;code search work-flow. I can copy &#x2F; or re-mix code I find in a google search, but it&#x27;s my responsibility to figure out the copyright situation of that code.<p>Imagine if something like google didn&#x27;t exist, and then it suddenly did. People would be saying: &quot;This newfangled computer algorithm is giving everyone copies of my code with a misattributed licence, just by typing the function name and site:github.com !&quot;
javajosh超过 2 年前
It&#x27;s too bad we can&#x27;t experiment with interesting things like Copilot without worrying about remuneration and the respecting of rights. But that&#x27;s the way of the world - we must think of these things. MS&#x2F;Github should give code copyright holders a simple and easy way to opt-out of contributing their code to the Copilot corpus. Currently the only way to opt-out is to make your repo private. That&#x27;s not good enough.<p>It would be better, of course, if Copilot was opt-in, but they&#x27;d never go for that.
评论 #33245072 未加载
Springtime超过 2 年前
A bit meta but anyone know why the submission title contains unicode between various characters?<p>It&#x27;s hidden on both Chromium&#x2F;Firefox when viewing the page but when saving the page it reveals them in the text field, eg: `GitHub Copi_lot inves_ti_ga_tion`<p>Plugging the title into a unicode converter shows they&#x27;re &#x27;soft hyphen&#x27; characters<p>GitHub Copi [0x00AD] lot inves [0x00AD] ti [0x00AD] ga [0x00AD] tion<p>Edit: apparently they&#x27;re for indicating to formatters where character breaks should be, though I can&#x27;t understand the consistency here.
chiefalchemist超过 2 年前
&gt; Arguably, Microsoft is cre­at­ing a new walled gar­den that will inhibit pro­gram­mers from dis­cov­er­ing tra­di­tional open-source com­mu­ni­ties. Or at the very least, remove any incen­tive to do so.<p>The walled garden bit I get. But I&#x27;m lost making the leap to &quot;remove any incentive to do so.&quot; Is Butterick suggesting that someone is going to put aside their code and do a deep dive on GitHub looking for a snippet that might not exist?<p>I&#x27;m not trolling. I&#x27;m sincerely trying to grasp the argument being made.
Dave3of5超过 2 年前
Question for all those who are pro Copilot in this argument and are claiming fair use. Do these same rules apply if I manually copy someone elses copyrighted code into my codebase ?
layer8超过 2 年前
It seems to me that in principle it should be possible to maintain attributions through the training process, so that Copilot outputs could come with a list of weighted sources, possibly discarding those that fall below a certain weight threshold. Doing so would likely be much more expensive in terms of the computational power needed for training, and probably also in the size of the model. But it would be great to actually be able to see what went into a specific Copilot output.
rapht超过 2 年前
All this just shows one thing : copyrighting &#x2F; licensing &quot;code&quot; is meaningless... but of course that was already known by all those people who think that the US laws about copyright should not have been propagated to the rest of the world. &quot;Code&quot; is merely an algorithm put to work. There should be nothing inherently copyrightable about this, no more so than the recipe take a chocolate is just a way to put chocolate and a few other ingredients to work.
hbarka超过 2 年前
Do the same copyright issues arise with AI-generated videos learned from Shutterstock?<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=33239706" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=33239706</a><p><a href="https:&#x2F;&#x2F;waxy.org&#x2F;2022&#x2F;09&#x2F;ai-data-laundering-how-academic-and-nonprofit-researchers-shield-tech-companies-from-accountability&#x2F;" rel="nofollow">https:&#x2F;&#x2F;waxy.org&#x2F;2022&#x2F;09&#x2F;ai-data-laundering-how-academic-and...</a>
jarek83超过 2 年前
In most of the controversies posted on HN they usually end up with a feeling that nothing would change, because only us, the tech community knows about details of an issue and we are too few to have an impact.<p>But this is solely affecting a product where we are the target audience, where if we oppose, thing should change. Now I wonder if it will show that we are actually caring that much to action or we are just as regular consumers as non-tech people in all other cases.
gareth_untether超过 2 年前
An intrim update to Copilot could link to where the code as pulled from. Or maybe there&#x27;s a way for open source devs to add a comment to the code that links to their community&#x2F;repo. If it was standardised then any data gathering would need to follow the collection rule.<p>I agree with the article&#x27;s long term outlook about community and code quality, it is a very long term outlook though. It makes me wonder if humans will actually be writing code.
standup超过 2 年前
My biggest concern regarding GitHub Copilot is that it is cloud based and opens up our previously private coding activities to continuous surveillance by third-parties.<p>It&#x27;s only a matter of time before intelligence agencies will get their hands on the data. And if use of Copilot becomes an industry wide practice then those who wish to preserve their privacy will become uncompetitive.<p>I really hope we have some decent offline alternatives eventually.
version_five超过 2 年前
New tech creates winners and losers, and losers inevitably complain. See looms, VHS, Napster, etc. The more of this complaining I see, the more it falls flat. The only interesting thing is which side different communities end up being on.<p>To be fair, record companies were not in the least bit sympathetic. Open source contributors are easier to identify with, though imo it doesn&#x27;t actually make their concerns more valid
评论 #33240697 未加载
评论 #33240728 未加载
kybernetikos超过 2 年前
Learned weights should be considered a derived work of all the things the model was trained on.<p>I think &#x27;training an AI&#x27; is actually a distinctly new use of IP, and should probably be considered under a specific kind of &#x27;AI-use&#x27; license. Open Source licenses should be updated to indicate whether they allow or do not allow AIs to be trained on covered work as well as the other rights they allow.
dwgebler超过 2 年前
For me, as a (granted very minor) contributor to some open source, I couldn&#x27;t care less about attribution. The ethos of open source is specifically about sharing stuff (probably for free) for the benefit of everyone, take a penny, leave a penny. It&#x27;s more of an interesting question if Copilot is suggesting code verbatim from source-available rather than open source repos though.
vletal超过 2 年前
Imagine you are a history or philosophy teacher in 2100.<p>How cool it is to discuss these kind of issues? What do you think about the &quot;erasing open source community&quot; argument from the historical perspective? What does it have in common with industrial revolution?<p>Even though the real life implications are real, I find it fascinating and not so simple to unravel.
i_like_apis超过 2 年前
This is such a disingenuous use of the word “investigation”.<p>They are “investigating” whether they should start a law suit. So this is not an investigation, it’s somewhere between “due diligence” and a PR stunt.<p>I very much disagree with the idea of a law suit that seeks to establish ML training as not being fair use. It is an utterly foolish thing for them to wish for.
nojvek超过 2 年前
Is GH Copilot only on public repos? My assumption is that code from private repos was also showing up. I feel like I read an HN article about this previously. Don&#x27;t have evidence but that seems like a much bigger issue and trust violation if that is true.<p>Kinda like how gmail was reading everyone&#x27;s emails and showing ads based on them.
makestuff超过 2 年前
I wonder in court if they will rule that this is no different than a human reading open source code to learn how to code. I guess the main difference here is the human would not be able to be used in parallel where Copilot can be used by millions of people at one time.<p>It will be interesting to see where this goes.
Roark66超过 2 年前
Well I am &quot;a fan&quot; of Copilot and I do think AI is the future, but I think the author has a valid point.<p>I think the fair use violation he describes doesn&#x27;t happen during training. I do think training AI on anything that is publicly accessible is fair use just as in an example of a person learning by reading&#x2F;watching the same materials.<p>However, this fair use rule is being violated the moment the resulting AI starts suggesting verbatim copied code from licensed works without attribution.<p>So one could argue the source code is not being used in a transformative way but copilot is just more efficient method of retrieval of licensed code. This misses the fact copilot actually is capable of writing new code. I&#x27;ve used it as &quot;an autocomplete on steroids&quot;. Letting it suggest maybe half a line, or 1 line of code at a time (or trivial stuff we automate even without copilot like getters&#x2F;setters in java). But when actual licensed code is suggested yes, this is IMO a license violation.<p>Therefore one way of resolving this would be to pair copilot with a tool that scanned the resulting code for presence of licensed code then it woukd make a list of &quot;credits&quot; or references. Also there should be measures taken (perhaps during training) to penalise generation of verbatim (or extremely similar) code. Would this make copilot less of a useful tool? I&#x27;m not sure.<p>One thing that&#x27;s not going to happen is putting tools like copilot back &quot;in the bottle&quot;. We now have similar models anyone can download (faux pilot) and I as well as many others have found those tools to speed up mundane tasks a lot. This translates into monetary advantage for users. Therefore there is no way this will disappear, lawsuit or no lawsuit.
nfw2超过 2 年前
The narrative here seems to be a David and Goliath story as Microsoft profits by stomping on the defenseless open-source communities. There&#x27;s two problems with this story.<p>First, the huge majority of open-source projects are at no real risk because Copilot offers something totally different from what they offer. Open-source projects generally take highly-complex domains and expose them as simple interfaces or executable programs. This encapsulation is where the value lies.<p>In contrast, Copilot just dumps code. Never once doing front-end work have I thought &quot;if only there was a way to dump verbatim React internals directly into my codebase.&quot; In general, Copilot only replaces tasks I would have otherwise done myself.<p>The second problem is the biggest loser if Copilot gets shut down is not Microsoft, who can easily take the loss in stride. The real loser is the community of developers, many of them bootstrapping their own projects or trying to develop open-source in their precious off-hours, for whom every minute counts, and for whom tools like Copilot can be the difference between success and failure.
mmastrac超过 2 年前
I think the test for whether an AI is infringing or not should be:<p>Can this AI regurgitate the vast majority of the creative aspects of an original&#x2F;novel piece of software with minimal prompting, to the point where the output code looks mostly and directly cloned to a reasonable person trained in the art?
评论 #33240872 未加载
fregante超过 2 年前
If this becomes illegal, it will pretty much mark the death of free&#x2F;open ML and its sets.<p>If you can&#x27;t train on data before asking for permission, the data set becomes sparse. Thje only people who will be able to afford this will be, you guessed it, established giants who can build their own sets.
vaxman超过 2 年前
Getty Images already handled this issue with graphics. Most of their catalog was scraped early on by AI art generators. &#x27;Errbody knows this because the Getty Images watermark appears in a lot of AI generated art. Getty Images, in turn, banned the sale of AI generated art because it is legally tainted.<p>The same thing will happen to source code produced by AI code generators. Github itself, or some entrepreneur, will come up with a way to identify and flag projects containing AI generated code based on models constructed from open source projects, so that those derivative works will not inadvertently be incorporated into other software that is concerned with such a flag. (They probably will also come up with an NFT-based mechanism of some sort to allow open source project rights holders to authorize incorporation of their code into AI models such that derivative works containing those fragments would not be subject to flagging.)<p>Hey YCombinator, give me $10M to make a billion dollar company that &quot;lives at the intersection of&quot; blockchain and open source. (Haha, No.)
scotty79超过 2 年前
&gt; how will you feel if Copi­lot erases your open-source com­mu­nity<p>How will you feel if greed of a lawer erases progress of your tools?<p>Lawyers are a detriment to anything they touch. Letting them into software was the biggest mistake we ever made. We should kept them away same way they are kept away from math.
truth_seeker超过 2 年前
I am feeling very greedy but ....<p>With all that intelligence if GitHub Copilot can&#x27;t produce easy to use and manage full stack framework yet with distributed database inbuilt in either any existing programming language or perhaps a new one created by itself then its not useful for me.
Defitio超过 2 年前
I&#x27;m against software patents to most degree.<p>Especially with algorithms.<p>I was rooting for Google when the JVM topic happened and I&#x27;m rooting for GitHub with autopilot.<p>And yes there is src from me on GitHub too but use it! I used so much other code in the last 15 years.<p>Copyright on algorithm or basic code should be a no go.
steve_taylor超过 2 年前
Open source has trained me and countless others. We have learned from it. Why shouldn’t machines learn from it too? Is co-pilot copy-pasting slabs of code verbatim?<p>I see Copilot as a net positive. Open source is for sharing and learning. Copilot is sharing and learning on steroids.
评论 #33241728 未加载
captainmuon超过 2 年前
Oh god please no. GitHub Copilot is a wonderful technology. I am not taking anything away from you if Copilot suggests code that is similar or identical to your copyrighted code. You were not going to sell it to me anyway.<p>The following is supposed to be OK: somebody reads your GPLed code, learns abstract concepts from it, teaches it to me, I write code that uses the same algorithm. But it&#x27;s not OK to abbreviate the process and reach the same result directly with Copilot. That is some Talmudic level reasoning. In a sane legal system, one would note that it is legal to do when jumping through pointless hoops, so it should be legal per se, and the system should be adjusted.<p>Copyright is increasingly at odds with technological development. Not just since AI applications, at least since Napster or since floppy disks. Of course Matthew Butterick as a lawer would disagree - &quot;It is difficult to get a man to understand something, when his salary depends on his not understanding it.&quot;
评论 #33246333 未加载
renewiltord超过 2 年前
Honestly, Github Copilot seems fine. It&#x27;s just a tool that you&#x27;re responsible for using responsibly. If I Google something, and copy and paste that, then Google is not responsible for my infringing. It&#x27;s just &quot;intelligent autocomplete&quot;.
评论 #33241342 未加载
评论 #33240698 未加载
braingenious超过 2 年前
I’m really interested in seeing how this gets litigated. I imagine it will involve a lot of philosophical arguments about attribution and what the software is <i>actually doing</i>.<p>I’m also curious to see if&#x2F;how Amazon CodeWhisperer takes advantage of this whole debacle.
brigandish超过 2 年前
Perhaps the only way out of this is to start suing the users of Copilot, much as some jurisdictions target the users of a product (e.g. drugs, prostitution) as a means to shut it down when the providers are too difficult or numerous to challenge effectively.
low_tech_punk超过 2 年前
Sadly, I think this marks the beginning of a winner-takes-all economy fueled by AI.<p>Just imagine how in a lawsuit like this, OpenAI can use GPT-3 to generate eloquent court speech with statistical confidence that it can defeat human lawyers? It just comes down to TPU power.
WrtCdEvrydy超过 2 年前
I wonder what will happen when a company pays some overseas developers $50 for some code, they copy it from Copilot and it copies a bug from a US developer and that company gets hacked for $10 million.<p>Will the lawsuit fall on the overseas developer, US developer or Github?
评论 #33241214 未加载
throwaway2037超过 2 年前
Sorry to ask a shallow question. His &quot;photo&quot; is so interesting. It feels exactly like old school Wall Street Journal &quot;photos&quot; from 1990s. Is there a plug-in or service to create this type of image from a photograph?
评论 #33246274 未加载
sergiotapia超过 2 年前
Is there a license that explicitly forbids corporations from ingesting my code and making a billion dollars off of my work for free? The AGPL? I&#x27;ve been using the MIT license for more than a decade, but it&#x27;s time to change that.
评论 #33243493 未加载
hsuduebc2超过 2 年前
In this next episode of <i>corporation name</i> seems like a cool corp but reveals as selfish and malicious inc. we could saw <i>corporation name</i> act selfishly and maliciously as in every other episode. See you next time kids.
smegsicle超过 2 年前
&gt; GitHub Copi-lot inves-ti-ga-tion<p>lol rarely see such aggressive use of soft hyphens in page titles
suyash超过 2 年前
This investigation should not stop at GitHub Co-Pilot, large language models currently that are trained on huge amount of data should also be investigated as I&#x27;m sure there are lot&#x27;s of problems to be found there.
NautilusWave超过 2 年前
Can someone clarify if copyright violation is actually considered &quot;illegal&quot;? As far as I know it&#x27;s a civil matter and not something a state or federal government would attempt to prosecute.
评论 #33241723 未加载
scombridae超过 2 年前
But this has long been the deal. In order to offer their services gratis, Big Tech makes money on your data, which you&#x27;ve freely provided. Welcome to the last twenty years of the software economy?
amai超过 2 年前
I have a badge on GitHub showing that I am a Arctic Code Vault Contributor. Why can&#x27;t Microsoft do something similar for Copilot training data contributors? That would at least be a start.
e-clinton超过 2 年前
Part of me feels like this will help big tech and hurt potential startups that’d compete in this space. Microsoft has the resources to make this issue “go away” while smaller incumbents will not.
slugiscool99超过 2 年前
Maybe the software engineers are worried they&#x27;re being made redundant, but it is super fair for them to not allow their <i>own work</i> to make them redundant without permission
kioleanu超过 2 年前
Uhm, strangely I get a Connection Reset error in the browser when I try to access the URL from the corporate network, but it works without problems from my phone
ynbl_超过 2 年前
&gt; the biggest concern of the decade is that some stupid autocomplete can violate your license which never existed in the first place<p>this is why hapas are superior to wh*tes.
pmayrgundter超过 2 年前
Crazy idea.. have the automatic code generator check if the code is too similar to a source it was trained on, and if so, automatically include attribution as well.<p>Ta da!
amai超过 2 年前
I have a badge on GitHub showing that I am a Arctic Code Vault Contributor. Why can&#x27;t Microsoft do something similar for Copilot contributors?
random_kris超过 2 年前
Let us just work on cool technical things without having to worry about this kind of bullshit.<p>Knowledge data should be free to copy and do whatever we want with it
评论 #33244092 未加载
pennaMan超过 2 年前
I&#x27;ll scream this at the top of my lungs whenever I get the chance: If you attribute copyright to open source code you are a patent troll.
kurtreed超过 2 年前
To me the whole point of open source is selfless giving and sharing. You build something and release the source code in case it&#x27;s useful for whatever purpose people might have: learning, understanding, contributing, forking, copying, etc. And companies might build on it, train models from it, use it internally, who knows. Great. Other companies can do the same and compete. So can other open source projects.<p>For some reason when a company benefits from your work instead of some other entity that&#x27;s bad? Please explain.
评论 #33248031 未加载
评论 #33249814 未加载
fartsucker69超过 2 年前
could this be solved by MS brute-force shipping all the licenses (w&#x2F; references to their original projects) of all the repos they used to train to copilot along with copilot itself?<p>it wouldn&#x27;t cover cases where people illegally copy pasted some code into their projects with dubious &#x2F; not explicit licenses, but this is the same as using any open source project in general.
spookyuser超过 2 年前
This is so stupid I can’t believe how this community has become toward some of the most inspiring new technology I’ve seen in a decade.
mjan22640超过 2 年前
Soon machines will step up from being an aid to doing the creative work themselves and copyright will be an artefact of the past.
pmarreck超过 2 年前
Would an opt-in system fix this? Where your code is only learned from if you opt into using Copilot to help you develop faster?
cabaalis超过 2 年前
It seems a benchmark of a transformative technology is whether or not people attempt to use the legal system to stop it.
redog超过 2 年前
Copyright is the problem. The rest of this is just dancing around the legal framework built to support the bullshit.
frankjr超过 2 年前
I wouldn&#x27;t be surprised if Microsoft lawyers didn&#x27;t like the word &quot;github&quot; in the domain name...
mjan22640超过 2 年前
I think the infringement that is relevant in practice comes from users of Copilot rather than from its authors.
melonmouse超过 2 年前
As a joke, I made a webpage where you can do attribution to ALL GitHub repositories:<p><a href="http:&#x2F;&#x2F;thanksforthecode.com" rel="nofollow">http:&#x2F;&#x2F;thanksforthecode.com</a><p>It scrolls past all the repos movie-credits-style. Doing it that way takes several days! It shows how abstract and absurd giving contribution to such a large body of works is.
评论 #33244851 未加载
welder超过 2 年前
Time for a new open source license specifically allowing fair use for machine learning?
jacooper超过 2 年前
This would for sure have affects on anything related to Ai content generation.
wkdneidbwf超过 2 年前
it seems like copilot is simply a search engine in this context. when i search gh or google or &lt;insert tool&gt; i can get code snippets without seeing the license.<p>how is copilot doing something fundamentally different?
swhalen超过 2 年前
Why can&#x27;t Tim Davis (or another software author whose code is emitted verbatim by Copilot) demand that Microsoft take down Copilot, or at least the part of Copilot that contains his code?<p>Microsoft is distributing his software without a license, isn&#x27;t it?
评论 #33240902 未加载
obiefernandez超过 2 年前
How can I personally and proactively fight against this effort?
semireg超过 2 年前
Is there an AI system for those dot woodcut prints ie WSJ?
birthday超过 2 年前
I wonder if it emits stable diffusion samples? ;-)
zhander超过 2 年前
The discussion has been &quot;cleaned up&quot; massively. All Copilot discussions are heavily manipulated.<p>I don&#x27;t know why one can freely pile on, e.g., AirBNB here but Copilot is a sacred cow.
Aeolun超过 2 年前
This right here is why we can’t have good things.
airstrike超过 2 年前
ITT: armchair lawyers go after GitHub
albertizzley超过 2 年前
suing Github for this seems like a neat idea to make money on our open source projects
glouwbug超过 2 年前
Big money here. Good luck
Havoc超过 2 年前
I find it hard to see a scenario where MS doesn’t get absolutely wrecked in court.
olliej超过 2 年前
I feel &quot;stealing your community&quot; is lawyer hyperbole, but people also seem ok with what MS is doing with copilot, and I am not.<p>If you think what copilot is doing is ok, and there is nothing wrong with it, I&#x27;d love it if you could go through this small thought exercise, and see if it impacts your view at all:<p>Say you write a bunch of code, and release it under GPL. For the sake of argument imagine it is something complicated that you care about.<p>Now say another person is trying to do what your code does, and they find your code, a say &quot;excellent&quot;. They then copy and paste it into their project, and release their code under a BSD license instead.<p>Would you consider this theft of your IP? The law certainly would, and I think most devs would as well.<p>What would you say if they instead release &quot;their&quot; code as public domain?<p>Now we&#x27;ll go a bit further. Another person is trying to solve this problem in some commercial software. They find your code, copy-paste it into their project, then sell their software and don&#x27;t release the source, or even acknowledge you.<p>Would you consider _this_ theft? again the law would.<p>Now, what if instead they found your code through the invalid BSD relicense? or the invalid public domain one?<p>To me every one of these would be theft, and every one would be required to required to release the source of projects that made use of my GPL&#x27;d code, under the GPL. That is literally the whole point of the GPL.<p>But let&#x27;s imagine a different route.<p>A person is writing some code and can&#x27;t work out how to solve a problem, so they ask on StackOverflow. Now another person comes along and answer the question by copy-pasting from your project into SO. The first person says &quot;yay!&quot; and then copies that code, and we repeat the above scenarios.<p>In an even more extreme case, imagine both of the above people work at the same large company - so neither knows or is even aware of the other - how does this impact what is going on? It&#x27;s two people, but fundamentally the company is copying the original GPL code into SO, then copying it from SO into its proprietary code.<p>I get that MS and GitHub try to position it as if copilot is &quot;creating code&quot;, but it is simply doing a statistical code completion that is demonstrably happy to copy and paste from the original source into the recipient code. To my mind all it is doing is providing a mechanism to launder GPL (or whatever) code into your own without the license, by slapping &quot;ML&quot; and &quot;AI&quot; on the process and requiring more than 3 keys to be involved.
评论 #33241464 未加载
toombowoombo超过 2 年前
Daaaamn, this post was on my top almost since it was published.
belter超过 2 年前
TL;DR: GitHub(Microsoft) declared that: “train­ing [machine-learn­ing] sys­tems on pub­lic data is fair use”. When asked for the relevant jurispru­dence to sup­port it&#x27;s posi­tion, could not provide any.
Move37超过 2 年前
Great article!
shp0ngle超过 2 年前
Training AI on copyrighted works is literally what Google always did.<p>Look how Google News enraged news orgs.<p>Now they come for the programmers. So now it’s a problem.
Brian_K_White超过 2 年前
Fair use is about more than just the size of the excerpt, and even open source software still has a copyright and terms.<p>If you write an article about good writing, and quote a choice paragraph from someone else&#x27;s work to show an example, and credit that quote, that is fair use.<p>Is it fair use if you read an awesome paragraph, something that really is the result of the authors unique intellect and effort and craftsmanship, and makes you think &quot;damn&quot;, and then drop that same jewel into your book?<p>You can probably get away with it, because you probably just won&#x27;t be able to convince a judge that any single paragraph is that big of a theft.<p>But I don&#x27;t mean to ask if you can get away with it, I mean to ask if it should be considered fine honorable behavior.<p>The difference is, the paragraph isn&#x27;t being included for examination or comment or transformation, it&#x27;s being included to directly copy and perform it&#x27;s original function as part of what makes a work a great work, and, it&#x27;s not being credited in any bibliography or footnotes or directly.<p>The reader reads the paragraph and is impressed by <i>your</i> deep insight, which you never had, and the original author did.<p>How about if your new book has many such uncredited snips from other authors, such that your new work is denser and richer than any of the other individual authors?<p>This is what copilot is doing, or rather it&#x27;s facilitating people doing it, as far as I can tell.<p>The original snippets are functional, not there for examination, copied verbatim, not transformed (sometimes), and not credited.<p>Most of it comes from open source works anyway and most authors would probably be fine with it if the stuff was simply credited.<p>I think as a tool, in the context of software vs literature, the tool is probably more good than bad for everyone as a whole. It probably results in the generation of more, and more correct software. Since software is more like a machine than a novel, it benefits all of humanity when machines work well.<p>But it needs to somehow credit the original authors, or if that&#x27;s not possible then users do not get to claim credit for any work it was used on. Or, they can only claim a sort of tainted credit.<p>Maybe it needs a combimation of policies that together make a fair system. One element would be, the training set must be composed of strictly open source software (pick some definition). Then another element would be, any work that uses it, is tagged as such. You only get to say &quot;I wrote this, with copilot.&quot; not merely &quot;I wrote this&quot;. And any work that uses it is itself gpl. The individual snips maybe don&#x27;t have to be credited because the theory will be the training set as a whole was credited, and those are all available somewhere. You as a contributor won&#x27;t get credit for being in someone&#x27;s mp3 transcoder app, but that app WILL declare that it used the training set, and the training set WILL declare all of your material that is in it.<p>Maybe there can be a special version that only includes code where the original terms did not require anything at all, not even preserving the authors name or the license that says it&#x27;s free, and that version&#x27;s output can be used without credit.<p>If proprietary software wants to benefit from a tool like that, they can pay for licenses from other proprietary software developers to include their software in their ai&#x27;s training set, just like with normal software licensing for inclusion and re-sale in a new product.<p>But right now, as copilot currently exists, as far as I can tell it&#x27;s blowing past and ignoring ANY considerations like that and Github are simply outlaws.
comfypotato超过 2 年前
[deleted]
评论 #33240436 未加载
egypturnash超过 2 年前
It&#x27;s hilarious how when I express displeasure about AI image generators looking likely to take a huge bite out of <i>my</i> profession of &quot;artist&quot; and playing extremely fast and loose with fair use, I get told that it&#x27;s completely inevitable now and I should either retrain as a prompt engineer or go join the buggy whip manufacturers, but now that this is clearly violating <i>programmer</i> copyrights, you folks are starting to get angry.<p>I&#x27;ll just leave y&#x27;all with my favorite of the things you keep telling me to STFU about art AI with: If you&#x27;re the kind of programmer who feels threatened by this, then you&#x27;re not a <i>real</i> programmer.
评论 #33242164 未加载
评论 #33242209 未加载
评论 #33242187 未加载
评论 #33242302 未加载
评论 #33243619 未加载
评论 #33242260 未加载
nonasktell超过 2 年前
Why would anyone want to stop Copilot is beyond me.<p>Reinventing the wheel, millions of time a day, is an atrocity.<p>Millions of (wo)man hours, wasted, every single day, on writing solutions to problems that have already been solved. There is a partial solution to this, and it&#x27;s making people angry, it&#x27;s crazy.<p>If you put your code publicly on the internet, you should expect that people will reuse your code at some point, no one broke into your privates repositories.<p>Why would anyone waste their time to make other people waste more of their time is really beyond me.<p>Let go of your egos for once.
评论 #33242152 未加载
评论 #33241615 未加载
评论 #33241868 未加载
评论 #33242175 未加载
评论 #33241627 未加载
评论 #33241767 未加载
评论 #33241771 未加载
评论 #33242268 未加载
评论 #33241762 未加载
评论 #33242258 未加载
verisimilitudes超过 2 年前
Never forget this is how people who dare to reverse engineer Windows are treated: <a href="https:&#x2F;&#x2F;www.theregister.com&#x2F;2019&#x2F;07&#x2F;03&#x2F;reactos_windows_research_kernel_claim&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.theregister.com&#x2F;2019&#x2F;07&#x2F;03&#x2F;reactos_windows_resea...</a> <a href="https:&#x2F;&#x2F;marc.info&#x2F;?l=ros-dev&amp;m=118775346131654&amp;w=2" rel="nofollow">https:&#x2F;&#x2F;marc.info&#x2F;?l=ros-dev&amp;m=118775346131654&amp;w=2</a><p>I don&#x27;t use Github, but fuckers upload my code there anyway.<p>Copyright is evil, but only large corporations having copyright, even more than they already do, is even worse.
评论 #33241059 未加载
评论 #33241255 未加载
iostream25超过 2 年前
Either the user-base of HN suddenly became a bunch of unethical folks who don&#x27;t CARE about copyrights, usage licenses, authorship, or the future of open-source projects,<p>OR<p>This place is currently crawling with Micro$oft employees who have been instructed to swamp the place with disingenuous comments basically amounting to:<p>1) &quot;fair use&quot; is anything I want it to me<p>2) gimme your code NOW, because I want it, and it&#x27;s MINE<p>3) get used to habitual violation of licenses as the new normal<p>4) you are ruining progress! harming kittens!<p>I can&#x27;t see the actual HN crowd all suddenly being copilot users and fans, so that leaves me to conclude the latter.<p>I find Microsofts continual business model of evil to be rather threatening and annoying and they need to be checked, as they have only gotten worse with the decades. They abuse their market position to stifle any and all tech innovation. Break them up already.
authpor超过 2 年前
I&#x27;m more worried about the status of freedom in software, open source feels like a mirage to divert the attention away from the original issues from the FSF.
trasz超过 2 年前
One way to fix the problem would be to somehow feed Copilot a corpora of closed source code. This would either force Microsoft to add necessary copyright protections, or - which is imho more likely - would prove that those protections are already in place, but disabled for open source code.<p>A good start would be to take a leaked code of Windows, and then mechanically adjust all the names, constant values, and code formatting, and then publish it and observe.
RunSet超过 2 年前
&gt; Microsoft char­ac­ter­izes the out­put of Copi­lot as a series of code &quot;sug­ges­tions&quot;. Microsoft &quot;does not claim any rights&quot; in these sug­ges­tions. But nei­ther does Microsoft make any guar­an­tees about the cor­rect­ness, secu­rity, or exten­u­at­ing intel­lec­tual-prop­erty entan­gle­ments of the code so pro­duced. Once you accept a Copi­lot sug­ges­tion, all that becomes your prob­lem:<p>&gt; &quot;You are respon­si­ble for ensur­ing the secu­rity and qual­ity of your code. We rec­om­mend you take the same pre­cau­tions when using code gen­er­ated by GitHub Copi­lot that you would when using any code you didn’t write your­self. These pre­cau­tions include rig­or­ous test­ing, intel­lec­tual prop­erty scan­ning, and track­ing for secu­rity vul­ner­a­bil­i­ties.&quot;<p>I can&#x27;t help but recall:<p>&quot;Linux is a cancer that attaches itself in an intellectual property sense to everything it touches.&quot;<p>- Steve Ballmer, while CEO of Microsoft
评论 #33240785 未加载
评论 #33240651 未加载
评论 #33240970 未加载
评论 #33240968 未加载
评论 #33240559 未加载
rafaelero超过 2 年前
Copilot is great and this is a waste of time.
评论 #33241277 未加载
Scoundreller超过 2 年前
&gt; No match for domain &quot;GITHUBCOPILOTCLASSACTIONLAWSUITSETTLEMENT.COM&quot;.<p>&gt; Last update of whois database: 2022-10-17T23:07:12Z &lt;&lt;&lt;<p>Just sayin&#x27;...
0xferruccio超过 2 年前
Sad to see people trying to make copilot illegal<p>Using it is exactly like using Google. Google scrapes the internet and trains a model that gives you results for search queries on their website. The results may be copyright protected<p>Copilot scraped the internet to train a model that gives you results for code snippets in your code editor. The results may be copyright protected
评论 #33246195 未加载
z9znz超过 2 年前
MS needs to give up and terminate Copilot.<p>The potential legal issues are there, but that&#x27;s not why Copilot should die.<p>Copilot should die for any (or a combination of all) these reasons (and more which I don&#x27;t mention):<p>- the operator has to already understand the emitted code to be able to determine if it is what is needed, or to modify it if it is close but not quite right<p>- the operator may have a false sense of capability, leading to bugs and other problems that would appear later (in production?)<p>- wrong suggestions are a distraction from the careful mental structures which one maintains while writing software<p>- any problem that Copilot can solve with guaranteed correctness is probably trivial or already met by a (battle tested) library<p>Forgive the analogy, but effective automated code generation is like autonomous driving systems. Anything less than 100% accuracy is a risk, and in these examples risk of incorrect behavior is not acceptable.<p>Copilot seems like a pointy-haired boss fantasy where they can hire only junior programmers and expect successful software products.