科技回声

11 条评论

I still find it very (depressingly) hilarious how everybody sees this as a lawsuit about if training on copyrighted context is legal or not.Literally, the NYT claimed that OpenAI maintained a database of NYT's works and would just verbatim surface the content. This is not an AI issue, it's settled copyright law.

评论 #42953217 未加载

评论 #42953500 未加载

评论 #42953663 未加载

评论 #42956096 未加载

rustc3 个月前

I hope they don't settle early and we finally get an answer to whether training AI on copyrighted content is fair use or not.

评论 #42954157 未加载

n0rdy3 个月前

I like following the OpenAI vs. NYT case, as it's a great example of the controversial situation:- OpenAI created their models by parsing the internet by disregarding the copyrights, licenses, etc., or looking for a law loopholes- by doing that, OpenAI (alongside others) developed a new progressive tool that is shaping the world, and seems to be the next “internet”-like (impact-wise) thing- NYT is not happy about that, as their content is their main asset- less democratic countries, can apply even less ethical practices for data mining, as the copyright laws don't work there, so one might claim that it's a question of national defense, considering the fact that AI is actively used in the miltech these days- while the ethical part is less controversial (imho, as I'm with NYT there), the legal one is more complicated: the laws might simply say nothing about this use case (think GPL vs. AGPL license), so the world might need new ones.And so on...

screye3 个月前

I can't imagine a scenario where pre-training on someone else's works is fair-use, but distilling from a proprietary LLM isn't.

pkamb3 个月前

Is anyone building a public domain repository / AI training ground for old newspapers? Anything before 1930 has no restrictions. Newspapers.com has pretty good content but the interface and search is extremely lacking. Google News was abandoned a decade ago. This seems like something where AI could really help, for once. Not in training chatbots or whatever but actually just providing great search for articles in books, newspapers, and magazines.

评论 #42953564 未加载

评论 #42953155 未加载

评论 #42954079 未加载

ViktorRay3 个月前

Would anyone here be able to explain to me where this money is going? Are the lawyers working for the New York Times really this expensive? If so these lawyers must be getting massive amounts of money...

评论 #42953038 未加载

评论 #42953072 未加载

nimish3 个月前

NYT will lose:Copyright only protects the actual text. LLMs have weights, not exact copies. In any case, saying "if I put in some input and get copyrighted output" is tantamount to copyright violations; if I use a generative tool and generate copyrighted info is it the tools fault?An LLM is a dump of effectively arbitrary numbers that, when hooked up to a command line, uses one of the world's most awful programming languages to evaluate and execute.OpenAI at most broke an EULA or some technicality on copyright w.r.t. local ephemeral copies. What's the damage to the NYT though?

评论 #42954438 未加载

评论 #42954439 未加载

评论 #42954561 未加载

评论 #42954291 未加载

gotoeleven3 个月前

Are they paying the lawyers with government money? I'm seriously asking. Why is the government paying 10s of millions of dollars/year to the New York Times? How can they still claim to be a news organization without having disclosed this? If the government is paying the NYT, then don't their productions belong in the public domain?<a href="https://x.com/stillgray/status/1887191056074350690" rel="nofollow">https://x.com/stillgray/status/1887191056074350690</a>

评论 #42954288 未加载

评论 #42954554 未加载

评论 #42965525 未加载

SebFender3 个月前

"OpenAI asserts that training AI models using publicly accessible content, including material from The New York Times, is protected under longstanding fair use principles."Incredible.The foundation of fair use is a transformative and non-consumptive use of copyrighted material.

tester7563 个月前

Why is it THAT expensive?

评论 #42952873 未加载

user39393823 个月前

My ideal solution would be to public domain anything NYT has written in the past, turn it over to archive.org, and dismantle NYT so it’s no longer an issue in the future.

11 条评论

lesuorac3 个月前

评论 #42953217 未加载

评论 #42953500 未加载

评论 #42953663 未加载

评论 #42956096 未加载

rustc3 个月前

I hope they don't settle early and we finally get an answer to whether training AI on copyrighted content is fair use or not.

评论 #42954157 未加载

n0rdy3 个月前

screye3 个月前

I can't imagine a scenario where pre-training on someone else's works is fair-use, but distilling from a proprietary LLM isn't.

pkamb3 个月前

评论 #42953564 未加载

评论 #42953155 未加载

评论 #42954079 未加载

ViktorRay3 个月前

评论 #42953038 未加载

评论 #42953072 未加载

nimish3 个月前

评论 #42954438 未加载

评论 #42954439 未加载

评论 #42954561 未加载

评论 #42954291 未加载

gotoeleven3 个月前

评论 #42954288 未加载

评论 #42954554 未加载

评论 #42965525 未加载

SebFender3 个月前

tester7563 个月前

Why is it THAT expensive?

评论 #42952873 未加载

user39393823 个月前

My ideal solution would be to public domain anything NYT has written in the past, turn it over to archive.org, and dismantle NYT so it’s no longer an issue in the future.

The New York Times Has Spent $10.8M in Its Legal Battle with OpenAI So Far

11 条评论

The New York Times Has Spent $10.8M in Its Legal Battle with OpenAI So Far

11 条评论