Let Microsoft first publish a Copilot model that's trained on the internal codebases of Azure, Windows and Office. That's the only way Microsoft can convince me that they truly believe Copilot is non-infringing technology.
It's likely that generative AI in general will be deemed fair use, due to its (generally) transformative nature. Sure, if you really coax it, you can get code or images out that look similar to existing ones, but the courts might see that generally speaking, it produces new content that has not been seen before, especially in the case of images.<p>Google Books literally copied and pasted books to add to their online database and that was deemed fair use, so something much more transformative like generative AI will likely fall under much broader consideration for fair use. Google Books was, yes, non-commercial, but the courts generally have the provision that the more transformative something is, the less it needs to adhere to the guidelines laid out for determining such fair use.<p><a href="https://ogc.harvard.edu/pages/copyright-and-fair-use" rel="nofollow noreferrer">https://ogc.harvard.edu/pages/copyright-and-fair-use</a>
Are there any actual details on this? I get that this is a blog post, but the only links I see on the page are to other blog posts. It leaves a lot of questions.<p>Is this blog post a legally enforceable contract? Is Microsoft specifically indemnifying all users of Copilot against claims of copyright infringement that arise from use of Copilot?<p>The blog post says that "there are important conditions to this program", and it lists a few, but are those conditions exhaustive, or are there more that the blog post doesn't cover? For example, is it only in specific countries, or does it apply to every legal system worldwide?<p>What guarantees do users have that Microsoft won't discontinue this program? If Microsoft gets kicked in the teeth repeatedly by courts ruling against them, and they realize that even they can't afford to pay out every time Copilot license-launders large chunks of copyrighted code, what means to users have to keep Microsoft to its promises?
This is a very clever move by Microsoft. In essence they are painting a giant bullseye on their back to any lawsuits that may arise. The idea being that they have the resources to challenge them (they aren't wrong).<p>The way AI is going I'm sure we'll see some landmark cases very soon. It is very much in Microsoft's interest to grow this market as fast as possible and be at the center of it. This removes one of the key impediments to adopting generated code for smaller orgs: "Will I get sued if this product generates code that is copyrighted?".
With a big asterik--
"customers... must not attempt to generate infringing materials..."<p>It hinges on what *Microsoft* decides "attempting to generate infringing materials" means. You'd like it to mean that it only excludes use when you're doing something you know would infringe copyright, like "reproduce the entire half life 2 source code." But who knows.
It may not be that simple: Microsoft may assume liability but an infringer can still be sued separately. MS may then be on the hook for the court costs. But you can't just categorically shield the users of a product from being sued.<p>This is the key bit:<p>"Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products."<p>The 'we will defend' is one important part, I assume that means that you will be using their lawyers rather than your own (which they have in house and so are cheaper to use than the ones that bill you, the would be defendant by the hour).<p>The second part that matters is that there are conditions on how you are supposed to use the product and crucially: you will have to document that this is how you used it.<p>But: interesting development, clearly enterprise customers are a bit wary of accidentally engaging in copyright infringement by using the tool and that may well have slowed down adoption.
Only so long as you have the guardrails enabled. One of the guardrails being that copilot will not output any code that exists in any github repo.<p>We tested copilot with those guardrails enabled and it completely lobotomizes it.<p>This by the way is not a change. They already had this “Microsoft will assume liability if you get sued” clause in Copilot Product Specific Terms: <a href="https://github.com/customer-terms/github-copilot-product-specific-terms">https://github.com/customer-terms/github-copilot-product-spe...</a>
I've received a lot of flak for this answer in other communities, but, if a statistical model is producing purely derivative works using a mathematical model that's basically a next best token predictor, is it really "stealing"?<p>Is it "stealing" to have a working understanding of the next best token, or even simply the token that shows up the most often (e.g. on GitHub)?<p>I'm sure that the argument could be made that all AI should be illegal as all ideas worth having have already been had, and all text worth writing has already been written, but, where would that leave us?<p>(e.g. your function for converting a string from uppercase to lowercase will probably look like a function that someone else on Earth has written, and the same goes for your error handling code, your state of the art technique for centering a div, etc.)
I wonder how binding this kind of public commitment is. The same way Musk recently said publicly that he'll cover the cost of anyone having work or legal issues for something they said on the platform (and now refuses honor the engagement).
If a codebase was infringing the GPL, the remedy is to publish the offending source code or terminate distribution. Neither are cases I suspect Microsoft cares about when talking about 3rd party code.<p>I don't know what case history is like for damages with open source projects, but I suspect it wouldn't be that big of a concern for Microsoft.<p>Otherwise stated, Microsoft's downside to this is committing their lawyers. And the upside is to improve their code generation tools.<p>IANAL though.
I'm just curious why is everyone talking about transformative nature and so little focus is given to:<p><i>4.the effect of the use upon the potential market for or value of the copyrighted work</i> (wiki)<p>I don't know if this particular case is good for exploring all angles of fair use, but to me this certainly is a greater hurdle for commercial generative ai.
Wouldn't you have to first prove that your content came from Microsoft services? Hopefully you track & certify the provenance of every line of code and content you paste? Microsoft surely won't just take your word for it that your content came from them, so how would this play out in practice, exactly?
I just had a horrible thought: what happens when there's a DMCA takedown request to remove an infringement in a widely used LLM? I've seen requests against training data, but never against the output of an LLM.
Copyright related stuff is annoying. I cant see why any one would care. If you publish something to the public domain I dont understand why you get rights to your content that you can self declare. Its completely ludicrous and only works at the corporate money level because they have liability and resources to sue. I wish people would use a little more common sense and understand the words ‘public domain’. Regardless of what people say, I can let you know that no one really cares about copyright and in terms of AI, its an unmovable mountain. Good luck wasting time on figuring out an issue that provides nothing to humanity
Another way to look at this is:<p>Microsoft just became a code copyright insurance company. The premium is paid for with individual copilot accounts for each developer. And the policy has its exceptions of course.<p>This is interesting.
Has anyone noticed that Copilot will shade out it’s answers more often when it’s writing code now? Usually I’ll paste in react components and ask it to fix the tailwind styling, but once it starts writing it gets filtered out by some secondary filter about half way through. I thought maybe the code it was outputting was too similar to copyrighted code and it triggered a liability filter of some sort.<p>In any case, super annoying to have that happen so consistently these days that I just use chatgpt to fix my tailwind styling now.
Maybe it is just me, but I found the quality of copilot suggestions so low , it is generally useable only on the most mundane and repetitive contexts. Why all the enthusiasm about it?
It used to be "Embrace, extend, and extinguish": <a href="https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...</a><p>Now it is "Train, Task, Transform, and Transfer":<p>Train - Feed copyrighted works into machine learning model or similar system<p>Task - Machine learning model is tasked with an input prompt<p>Transform - Machine learning model generates hybrid output derived from copyrighted works, but usually not directly traceable to a given work in the training set<p>Transfer - Generated output provides the essence of the copyrighted works, but is legally untraceable to the originals
Having dealt with Microsoft for 30 years as both a power user and developer, <i>"we believe in standing behind our customers when they use our products"</i>, is a lie.
A very relevant and recent posting:<p>GitHub Copilot and open source laundering<p><a href="https://drewdevault.com/2022/06/23/Copilot-GPL-washing.html" rel="nofollow noreferrer">https://drewdevault.com/2022/06/23/Copilot-GPL-washing.html</a><p>Previously on HN, in case you missed it:<p><a href="https://news.ycombinator.com/item?id=31848433">https://news.ycombinator.com/item?id=31848433</a>
Meanwhile they strike deals with news agencies to use their content to train on... This is of going to be a hard fight, but I really hope this ends up costing MS.
Yeah is it becoming clear enough to some people yet that you can't replace software engineers, let alone really <i>help</i> them, with AI? This is only going to get worse, not better.<p>Copilot is such a flawed product from the start. It's not even a matter of its ability to write "good" code. The concept is just dumb.<p>Code is necessarily consumed by people first before it's executed by a computer in a production environment. There are many ways to get a computer to do something, but the approval process by experienced humans is vastly more important than the drafting of it. Software dev is already incredibly cheap and the last place to cut costs.<p>There is no AI threat other than the one posed by grifters trying to convince you that there is.
This is one of the things people on this site have been saying that Microsoft should do if they really stand behind Copilot, and now that they've done it, you have again moved the goalposts and this announcement is entirely insufficient.<p>How dare they? amirite?