OpenAI Pleads It Can't Make Money Without Using Copyrighted Materials for Free

129 点作者 janandonly9 个月前

33 条评论

This is because they: expanded upon an existing sample-inefficient technology, commercialized the sample inefficient technology using copyrighted data, fundraised and expanded operations using this legally questionable technology, and are now complaining that they can’t balance their business expenses if they can’t keep using other people’s copyrighted works to feed their extremely sample inefficient data monster.What they could have done was stayed as an open research org when the tech started to work, and focused on sample efficiency and cultivating copyright free data sets. But they were too impatient to commercialize.Whoops.I don’t actually think intellectual property restrictions are good, but I don’t want a world where small creators have their rights stomped on by multi billion dollar corporations. Either we have copyright or we don’t, but unless OpenAI is also going to give up their copyrights this seems deeply unfair.

评论 #41438977 未加载

评论 #41439447 未加载

评论 #41439339 未加载

评论 #41439101 未加载

评论 #41439105 未加载

评论 #41439402 未加载

评论 #41439734 未加载

评论 #41439388 未加载

评论 #41464732 未加载

disposition29 个月前

Looks like this is from Jan. 2024...and there doesn't seem to be an update on the article itself.One might be able to find more information on the Committee's webpage (although, I'm not very familiar with the UK government...so this might not be accurate), <a href="https://committees.parliament.uk/work/7827/large-language-models/publications/" rel="nofollow">https://committees.parliament.uk/work/7827/large-language-mo...</a>

评论 #41438860 未加载

bmitc9 个月前

I can't stand corporations with excuseses like this. "But we have too much scale to fix that." "But we need this data to operate." "We can't be responsible at our scale." I say too bad for all of it. If you're not a viable business without bending rules and laws, then you're not a viable business.

评论 #41439366 未加载

TheCleric9 个月前

I'd like to frame this another way. OpenAI is saying they can't survive without copyright owners consenting to use their property to make OpenAI money. And honestly, if your business model is dependent on that, then that's your problem.We can argue over whether you should need consent or not, but personally I find nothing wrong with someone being unable to use things I've created to make a buck without my permission (unless otherwise indicated by an explicit license).

评论 #41439384 未加载

gwbas1c9 个月前

IMO: I think this is a very strong case for copyright reform; and a very strong indicator that our public domain isn't healthy enough.

justsomeshmuck9 个月前

I think it is important for America and Europe to take the side of using copyrighted works for LLM is fair use/not illegal. Advancement in this space in the west will be hindered otherwise, and nations that don’t respect IP law will have enormous advantage.

评论 #41439125 未加载

评论 #41439245 未加载

评论 #41439185 未加载

评论 #41439260 未加载

评论 #41439223 未加载

评论 #41439377 未加载

评论 #41439567 未加载

seizethecheese9 个月前

The alarmism in this thread is misguided. By advocating for excluding copyrighted works from LLM training, you are advocating an expansion of copyright protection. This attitude is opposite my understanding pf the hacker ethos.

评论 #41439320 未加载

评论 #41439229 未加载

elliottkember9 个月前

A very misleading title, that's not what they're saying at all. They're saying that training does not constitute a breach of copyright. "legally copyright law does not forbid training."

评论 #41439258 未加载

blacksmith_tb9 个月前

I wouldn't want to come to their defense, but the argument reminds me a little of earlier fights about if search engines owe the sites they crawl anything. Which led to things like Canada's Online News Act[1] which doesn't seem to me to have been very good for users in Canada (but I am not in Canada, maybe it has upsides?)From my perspective, before OpenAI used/stole these copyrighted works, the public had to pay the original creators to get access to them, and now they've been absorbed into ChatGPT and friends, we have to pay someone else... seems like a wash for end users?1: <a href="https://www.nbcnews.com/tech/tech-news/google-canada-law-online-news-c-18-bill-news-links-rcna91882" rel="nofollow">https://www.nbcnews.com/tech/tech-news/google-canada-law-onl...</a>

dcwca9 个月前

It is totally legal to train on this stuff, but illegal to reproduce copyrighted works. Interestingly, Google's business model could have been criticized the same way. They construct a big index of copyrighted works, reproduce them, and monetize it.

评论 #41439270 未加载

Workaccount29 个月前

The big question is whether or not a judge(s) will consider a vector space many orders of magnitude smaller than it's training set, and not really containing anything that resembles legible data, to be an archive of copyrighted works.To me it makes way more sense to just censor outputs. I can draw Batman from memory, but I wouldn't go out an start selling batman drawings. I can easily self censor.The solution for transformers is plainly obvious, but I can understand the fear of training something that might well displace you.

tim3339 个月前

>OpenAI Pleads It Can't Make Money Without Using Copyrighted Materials for Freeis not of course true. It said if it can't use the materials then its product would be bad and they'd lose out to Chinese competitors how did not have the restrictions.Not quite sure what the answer is but I spent a fair bit of time today trying to get access to some paper not produced by Elsevier but for which they have managed to gate the worlds access to to make a few bob. There's a lot to be said for information being free.

robryan9 个月前

It is interesting that they both licence content and say it is fair use. Seems like those who complain the loudest will get something for their content and everyone else will get nothing.

ChrisArchitect9 个月前

Not a new story.Some discussion in January:<a href="https://news.ycombinator.com/item?id=38912259">https://news.ycombinator.com/item?id=38912259</a>

burnte9 个月前

My response is: Ok. Not my problem. OpenAI isn't entitled to free profit. No one is.

skeledrew9 个月前

It makes sense, at a base level. Where would the cost of training (and thus the cost of accessing the service, if it got to that point without going bankrupt) be if all copyrighted works had to be paid for? What would the model quality be (and thus the model be worth) if only public domain content could be utilized? The only reasons this is an issue are because they got to the making money point (and ppl don't like seeing money being made if they can't get in in the action), and there are content creators afraid that this thing will make them obsolete (which will be the case for many). Typical clash of interests.

GiorgioG9 个月前

Let's face it current AI/LLMs are nice tinker-toys, but they are the tiny building blocks that the real power will come from some day - when we can run tens of thousands of these models to solve real problems and not...AutoComplete++. The hardware will need to be exponentially more powerful, efficient and cheap before the real AI-powered future we've been hyped/promised can be realized.

jmclnx9 个月前

>OpenAI is begging the British Parliament to allow it to use copyrighted works because it's supposedly "impossible" for the company to train its artificial intelligence modelsIf it was up to me, I would allow OpenAI access only if the license every single line of source under the GPLv3 (yes v3).Under any other license, "tough to be you".I expect OpenAI to go proprietary once they hit a certain level of market strength.

评论 #41439711 未加载

OutOfHere9 个月前

AI shows the many flaws in various poorly thought out IP and information related laws which should never have existed in the first place.

jasonlfunk9 个月前

Can someone help me understand why it's a problem for companies to train these huge LLM on your copyrighted material? What exactly is the harm that is being done to the copyright holder?I can understand why the New York Times (for example) wants to claim that a couple billion dollar companies have done it actual harm; but I am struggling to actually identify what it is.

评论 #41439393 未加载

评论 #41439440 未加载

leobg9 个月前

One more reason why this is B.S.:They can obviously license that content.No rights to publish. Just the right to use the content as part of their training data for the AI.

NemoNobody9 个月前

I think the best argument I've ever seen that AI ought to be a public good.

ChuckMcM9 个月前

Okay, that is just fucking hilarious. (sorry for the profanity).

quantum_state9 个月前

would sound very funny if it is generalized …

babyshake9 个月前

"I drink your milkshake" is the best four word summary of the situation I can imagine.

评论 #41439104 未加载

exe349 个月前

awh diddums. I also have to remain poor if I obey the law.

fungiblecog9 个月前

So big corporations love copyright laws when they can use them to make enormous profits - but then want exemptions when the exact same laws don’t allow them to make enormous profits. Welcome to the world we get when we let rich arseholes make all the rules.

biglyburrito9 个月前

Oh no. Anyway…

findthewords9 个月前

Let us put it bluntly, the AI bubble is built on piracy.

评论 #41439007 未加载

tivert9 个月前

In other news: burglar says he can't make money from his burglary skills without stealing, and pleads that the laws against theft be repealed.

dkersten9 个月前

I would also be able to make money if I could use copyrighted material for free, but that doesn’t make it ok for me to do.If they can’t exist without doing so, then maybe they shouldn’t exist. They don’t have any inherent right to making money.

评论 #41441809 未加载

_heimdall9 个月前

Isn't this the exact same business model that Eric Schmidt bragged about at Stanford?1) steal IP and build a thing2a) if it fails, rinse and repeat with a "new" step 12b) if it succeeds, hire a flees of lawyers to clean up the mess3) get rich

评论 #41444890 未加载

OutOfHere9 个月前

There is no violation because it can be argued that the AI is a sentient entity, and sentient entities have a right to read and remember texts borrowed from the library.

评论 #41439177 未加载

评论 #41439174 未加载

评论 #41439198 未加载

评论 #41439196 未加载

33 条评论

TaylorAlexander9 个月前

评论 #41438977 未加载

评论 #41439447 未加载

评论 #41439339 未加载

评论 #41439101 未加载

评论 #41439105 未加载

评论 #41439402 未加载

评论 #41439734 未加载

评论 #41439388 未加载

评论 #41464732 未加载

disposition29 个月前

评论 #41438860 未加载

bmitc9 个月前

评论 #41439366 未加载

TheCleric9 个月前

评论 #41439384 未加载

gwbas1c9 个月前

IMO: I think this is a very strong case for copyright reform; and a very strong indicator that our public domain isn't healthy enough.

justsomeshmuck9 个月前

评论 #41439125 未加载

评论 #41439245 未加载

评论 #41439185 未加载

评论 #41439260 未加载

评论 #41439223 未加载

评论 #41439377 未加载

评论 #41439567 未加载

seizethecheese9 个月前

评论 #41439320 未加载

评论 #41439229 未加载

elliottkember9 个月前

A very misleading title, that's not what they're saying at all. They're saying that training does not constitute a breach of copyright. "legally copyright law does not forbid training."

评论 #41439258 未加载

blacksmith_tb9 个月前

dcwca9 个月前

评论 #41439270 未加载

Workaccount29 个月前

tim3339 个月前

robryan9 个月前

It is interesting that they both licence content and say it is fair use. Seems like those who complain the loudest will get something for their content and everyone else will get nothing.

ChrisArchitect9 个月前

Not a new story.Some discussion in January:<a href="https://news.ycombinator.com/item?id=38912259">https://news.ycombinator.com/item?id=38912259</a>

burnte9 个月前

My response is: Ok. Not my problem. OpenAI isn't entitled to free profit. No one is.

skeledrew9 个月前

GiorgioG9 个月前

jmclnx9 个月前

评论 #41439711 未加载

OutOfHere9 个月前

AI shows the many flaws in various poorly thought out IP and information related laws which should never have existed in the first place.

jasonlfunk9 个月前

评论 #41439393 未加载

评论 #41439440 未加载

leobg9 个月前

One more reason why this is B.S.:They can obviously license that content.No rights to publish. Just the right to use the content as part of their training data for the AI.

NemoNobody9 个月前

I think the best argument I've ever seen that AI ought to be a public good.

ChuckMcM9 个月前

Okay, that is just fucking hilarious. (sorry for the profanity).

quantum_state9 个月前

would sound very funny if it is generalized …

babyshake9 个月前

"I drink your milkshake" is the best four word summary of the situation I can imagine.

评论 #41439104 未加载

exe349 个月前

awh diddums. I also have to remain poor if I obey the law.

fungiblecog9 个月前

biglyburrito9 个月前

Oh no. Anyway…

findthewords9 个月前

Let us put it bluntly, the AI bubble is built on piracy.

评论 #41439007 未加载

tivert9 个月前

In other news: burglar says he can't make money from his burglary skills without stealing, and pleads that the laws against theft be repealed.

dkersten9 个月前

评论 #41441809 未加载

_heimdall9 个月前

评论 #41444890 未加载

OutOfHere9 个月前

There is no violation because it can be argued that the AI is a sentient entity, and sentient entities have a right to read and remember texts borrowed from the library.

评论 #41439177 未加载

评论 #41439174 未加载

评论 #41439198 未加载

评论 #41439196 未加载