Thomson Reuters wins first major AI copyright case in the US

392 点作者 johnneville3 个月前

28 条评论

EnderWT3 个月前

<a href="https://archive.is/mu49I" rel="nofollow">https://archive.is/mu49I</a>

JackC3 个月前

Here's the full decision, which (like most decisions!) is largely written to be legible to non-lawyers: <a href="https://storage.courtlistener.com/recap/gov.uscourts.ded.72109/gov.uscourts.ded.72109.770.0.pdf" rel="nofollow">https://storage.courtlistener.com/recap/gov.uscourts.ded.721...</a>The core story seems to be: Westlaw writes and owns headnotes that help lawyers find legal cases about a particular topic. Ross paid people to translate those headnotes into new text, trained an AI on the translations, and used those to make a model that helps lawyers find legal cases about a particular topic. In that specific instance the court says this plan isn't fair use. If it was fair use, one could presumably just pay people to translate headnotes directly and make a Westlaw competitor, since translating headnotes is cheaper than writing new ones. And conversely if it isn't fair use where's the harm (the court notes no copyright violation was necessary for interoperability for example) -- one can still pay people to write fresh headnotes from caselaw and create the same training set.The court emphasizes "Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today." But I'm not sure "generative" is that meaningful a distinction here.You can definitely see how AI companies will be hustling to distinguish this from "we trained on copyrighted documents, and made a general purpose AI, and then people paid to use our AI to compete with the people who owned the documents." It's not quite the same, the connection is less direct, but it's not totally different.

评论 #43019617 未加载

评论 #43020961 未加载

评论 #43023135 未加载

评论 #43021150 未加载

评论 #43019193 未加载

评论 #43021141 未加载

评论 #43032523 未加载

评论 #43027277 未加载

评论 #43019030 未加载

评论 #43025437 未加载

评论 #43019036 未加载

veggieroll3 个月前

> Thomson Reuters prevailed on two of the four factors, but Bibas described the fourth as the most important, and ruled that Ross “meant to compete with Westlaw by developing a market substitute.”Yep. That's what people have been saying all along. If the intent is to substitute the original, then copying is not fair use.But the problem is that the current method for training requires this volume of data. So the models are legitimately not viable without massive copyright infringement.It'll be interesting to see how a defendant with a larger wallet will fare. But this doesn't look good.Though big-picture, it seems to me that the money-ed interests will ensure that even if the current legal landscape doesn't allow LLM's to exist, then they will lobby HARD until it is allowed. This is inevitable now that it's at least partially framed in national security terms.But I'd hope that this means there is a chance that if models have to train on all of human content, the weights will be available for free to all humans. If it requires massive copyright infringement on our content, we should all have an ownership stake in the resulting models.

评论 #43023573 未加载

评论 #43018653 未加载

评论 #43018618 未加载

Animats3 个月前

This isn't really about "AI". It's about copying summaries. Google was fined for this in France for copying news headlines into their search results, and now has to pay royalties in the EU. Westlaw is a summarizing and indexing service for court case results. It's been publishing that info in book form since 1872.Ross was trying to compete with Westlaw, but used Westlaw as an input. West's "Key Numbers" are, after a century and a half, a de-facto standard.[2] So Ross had to match that proprietary indexing system to compete. Their output had to match Westlaw's rather closely. That's the underlying problem. The court ruled that the objective was to directly compete with Westlaw, and using Westlaw's output to do that was intentional copyright infringement.This looks like a narrow holding, not one that generally covers feeding content into AI training systems.[1] <a href="https://apnews.com/article/google-france-news-publishers-copyright-7a7e484f55297e9803d17f736ff923a0" rel="nofollow">https://apnews.com/article/google-france-news-publishers-cop...</a>[2] <a href="https://guides.law.stanford.edu/cases/keynumbersystem" rel="nofollow">https://guides.law.stanford.edu/cases/keynumbersystem</a>

评论 #43019544 未加载

评论 #43019042 未加载

preinheimer3 个月前

Great. The stated goal of a lot of these companies seems to be “train the model on the output of humans, then hire us instead of the humans”.It’s been interesting that media where watermarking has been feasible (like photography) have seen creators get access to some compensation, while text based creators get nothing.

评论 #43018425 未加载

simonw3 个月前

Interesting to note from this 2020 story (when ROSS shut down) that the company was founded in 2014 and went out of business in 2020: <a href="https://www.lawnext.com/2020/12/legal-research-company-ross-to-shut-down-under-pressure-of-thomson-reuters-lawsuit.html" rel="nofollow">https://www.lawnext.com/2020/12/legal-research-company-ross-...</a>The fact that it took until 2024 for the case to resolve shows how long the wheels of justice can take to turn!

评论 #43023165 未加载

评论 #43025638 未加载

jll293 个月前

Note this case is explicitly NOT about large language model type AI - Ross' product is just a traditional search engine (information retrieval system), not a neural transformer a la ChatGPT.About judge Bibas: <a href="https://en.wikipedia.org/wiki/Stephanos_Bibas" rel="nofollow">https://en.wikipedia.org/wiki/Stephanos_Bibas</a>

dkjaudyeqooe3 个月前

The fair use aspect of the ruling should send a chill down the spines of all generative AI vendors. It's just one ruling but it's still bad.

评论 #43026153 未加载

aurizon3 个月前

At the heart of this is a very greedy racket:- court reporters who 'own' the copyright to every word spoken by anyone in court that they transcribe to a transcript that they do not own the source to (judges/witnesses/lawyers/defendants in truth own it) They then milk huge fees for these transcripts and limit use/access/derivative works with huge fees. An AI verbatim transcriber would up end them, so that will be prevented, as will anything that shakes the tree.

评论 #43023321 未加载

jug3 个月前

I spontaneously feel like this is bad news for open AI, while playing in the hands of corporate behemoths able to strike expensive deals with major publishers and top it off with the public domain.I’m not sure this signals the end of AI and a victory for the human, but rather who gets to train the models?

varsketiz3 个月前

Great decision for humans.Is this type of risk the reason why OpenAI masquerades as a non-profit?

评论 #43018716 未加载

oidar3 个月前

Ross intelligence was creating a product that would directly compete against Thomson Reuters. Pretty clearly not fair use.

ars3 个月前

It would be quite an interesting result if we could have true General AI, but we don't simply because of copyright.I'm aware this isn't a concern yet, but imagine if the future played out this way....Or worse: Only those with really deep pockets can pay to get AI, and no one else can, simply because they can't afford the copyright fees.

gradientsrneat3 个月前

Westlaw is to the legal profession what ResearchGate and others are to science research. They profit from information from the commons, and charge as much as the market will bear.Only one of the many reasons the legal profession is so expensive.

nickpsecurity3 个月前

Almost every article I read on fair use talked like I could only use small amounts while not competing with them. AI people focus on a tiny number of precedents that they stretch very far. A reasonable person wouldn’t come up with their interpretation of fair use after looking at how most examples play out in court.It shouldn’t surprise the writer that the AI companies’ versions of fair use didn’t hold much weight. They should assume that would be true. Then, be surprised any time a pro-AI ruling goes against common examples in case law. The AI companies are hoping to achieve that by throwing enough money at the legal system.

MonkeyClub3 个月前

From p. 6:"But a headnote can introduce creativity by distilling, synthesizing, or explaining part of an opinion, and thus be copyrightable."Does this set a precedent, whereby AI-generated summaries are copyrightable by the LLM owners?

teruakohatu3 个月前

Ross Intelligence was more a search interface with natural language and, probably, vector based similarity. So I suspect they were hosting and using the corpus in production, not just training a model on it.

NewsaHackO3 个月前

How does this affect LLM systems that already have their corpus integrated?

评论 #43018580 未加载

mmooss3 个月前

Thomson Reuters chose to sue Ross Intelligence, not a company like Google or even OpenAI. I wonder how deeper pockets would have affected the outcome.I wonder how the politics played out. The big AI companies could have funded Ross Intelligence, who could have threatened to sabotage their legal strategies by tanking and settling their own case in TR's favor.

评论 #43020262 未加载

评论 #43021451 未加载

评论 #43019900 未加载

vaadu3 个月前

Does anyone think Deepseek or other non-western AIs will respect copyright?This is going to make Deepseek and its kin much more valuable.

afarviral3 个月前

If those 4 aspects are used to judge whether "fair use", I'd say that's the nail in the coffin, because of course it isn't fair use and that's totally fair. Here I was thinking "transformative" was somehow a sticking point in all this.

biohcacker843 个月前

If copyright forces a diversity of AIs. That would be good.Every AI company using its own created training, resulting in AIs that are similar but not identical, is in my opinion much better than one or very few AIs.

iandanforth3 个月前

Establishing precedent by defeating an already dead company in court is neither impressive nor likely to hold up for other companies.

评论 #43019750 未加载

rvz3 个月前

See. The fair-use excuses that the AI proponents here were trying to hang on to for dear life have fallen flat on this ruling.This is going to be one of many cases in which there will be licensing deals being made out of this to stop AI grifters claiming 'fair use' to try to side-step copyright laws because they are using a gen AI system.OpenAI ended up paying up for the data with Shutterstock and other news sources. This will be no different.

评论 #43019089 未加载

评论 #43019409 未加载

xyzal3 个月前

I can't understand how some commenters frame such a result as not good. The big players will have no problem licensing large corpora to train their models, while my tiny site won't be vacuumed (legally at least) by scrapers if I won't agree.My willingness to upload my projects anywhere is in the historical lows given the current state, honestly.

2OEH8eoCRo03 个月前

Fantastic news!

lazycog5123 个月前

seems like delaware can't scare tech companies out of re-incorporating any faster

YesBox3 个月前

Thanks. The article wasn't loading for me, just the headline and image and footer. I was about to leave thinking that's all there is.

评论 #43018522 未加载