Related to <a href="https://news.ycombinator.com/item?id=35617763" rel="nofollow">https://news.ycombinator.com/item?id=35617763</a> ("Reddit Wants to Get Paid for Helping to Teach Big A.I. Systems"; an aside, but I much prefer the title of this post I'm commenting on as it describes the actual change) and it's hard to find this particularly disagreeable. Especially considering:<p>> Reddit’s API will remain free to developers who want to build apps and bots that help people to use Reddit, as well as to researchers who wish to study Reddit for strictly academic or noncommercial purposes.<p>> But companies that “crawl” Reddit for data and “don’t return any of that value” to users will have to pay up,” Reddit co-founder and CEO Steve Huffman told The Times.
Yesterday Stackoverflow, today Reddit. A clear pattern emerges where open web content/communities face existential issues if the current AI paradigm continues.<p>It's a daylight robbery. The sum of 18 years of Reddit is an enormous capital investment as well an immeasurable amount of hours spent by its users to create the content.<p>It's absolutely baffling how a single entity (OpenAI, Google Bard) can just take it all without permission or compensation, and then centrally and exclusively monetize these stolen goods.<p>The fact that we barely even blink when this happens, and that founders confidently execute on an idea like this, tells you everything there is to know about our industry. It doesn't even pretend to do good anymore. Anything goes, really.<p>Anyway, get ready for an "open" web that will consist of ever more private places with ever higher walls. Understandably so, any and all incentive to do something on the open web is not only pointless now, it actively helps to feed a giant private brain.
We drastically need copyright reform for text, imagery, video. It was never designed for this AI era.<p>If you take a concept like "fair use". Let's say I embed your photo and express an opinion about it. That's what fair use was designed for. In-context relatively harmless usage of the content of others, for the sake of expression, culture and education.<p>That's not the same thing as "let me suck up all content ever created without permission, attribution or compensation, mangle it and sell it via the backdoor whilst making you obsolete".<p>You can't call that fair use, they are wildly different usages at wildly different scales with wildly different impact.<p>We need a new copyright category specifically for AI usage. If nothing is expressed, no training permission is given. One can opt-in and allow for training, allow for training under conditions, etc.
> “The Reddit corpus of data is really valuable,”<p>Totally agree, no question about that. But data comes from users. Shouldn't they also get paid?
What a twist of fate! The social media generation companies built their success on aggregating other people’s content and offering new ways to interact with it. “No we’re just linking to what other sites provide for free”. Now, there’s a new leather jacket in town. “We’re just training on data that you already provide for free”.<p>Of course it’s fun to watch a turf war, and we can all cheer for our favorite team and quibble about who deserves a punch in the gut.<p>But, we also need to keep an eye on the horizon. This will change the world, even and especially the spaces that we currently rely on. Just look at what happened to legacy media when the aggregators came: it largely turned into blogspam and clickbait. Comment sections (like this one) aren’t perfect, but they’re a damn good pressure valve for regular people to interact with the world. What will happen to those, for instance?
The cynic in me thinks this will slowly morph into charging access for third-party apps too.<p>Third-party apps don't show ads; there's no reason ads couldn't be included in the feed and required to be shown as a condition of using the API, but I imagine it makes tracking impressions etc far more difficult. Any new features they add also need to either be incorporated into the API or remain unavailable for those users.<p>My only hope is that third-party apps remain niche enough that Reddit leaves them be; the first-party experiences are all awful to the point where I would probably just stop using Reddit if third-party offerings become unavailable.
These proposed changes to Reddit API are highly implied to also affect third-party apps, such as Apollo for iOS: <a href="https://www.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/" rel="nofollow">https://www.reddit.com/r/reddit/comments/12qwagm/an_update_r...</a><p>As noted in the comments, the API changes will also affect the quick .json representations of Reddit pages, which were an easy way to play with real-world data for beginners learning coding/data science.
Reddit wants to get paid for content it hosts.<p>Soon Reddit's users will want a cut of it for content they create.<p>Then all the places these users are copying content from will want their share.<p>There's no solution here. Either the web stays (mostly) open and free-for-all like it is now or everyone sets up their own little walls and ends the party.
“The world's entire scientific ... heritage ... is increasingly being digitized and locked up by a handful of private corporations....<p>The Open Access Movement has fought valiantly to ensure that scientists do not sign their copyrights away but instead ensure their work is published on the Internet, under terms that allow anyone to access it.” - Aaron Swartz<p>The irony here.
So the title is not correct? API access is free. Crawling is not?<p>A major title change came from the New York Times source that is "Reddit Wants to Get Paid for Helping to Teach Big A.I. Systems". Now that makes it much more clear what this is all about and why it is happening right now.
Closing the barn door after the horse has already bolted.<p>Pulling data off Reddit now will likely give you a very large amount of polluted data from LLMs. I mean, yea it could be useful for some broad topics at this point, but still likely to contain a lot of GPTs own feedback.<p>It's likely that companies like OpenAI will just use their old reddit dataset, and then move to scraping things like YouTube for not just text, but audio and imagery too.
I can already feel the outrage dying. Yesterday Elon was tech hitler for it, today Reddit is just doing 'what was inevitable anyway'.<p>One of the best parts about social media is watching swarms of people who know nothing pivot around things you know something about.
> The company said on Tuesday that it planned to begin charging companies for access to its application programming interface, or A.P.I.<p>If this also extends to independent third-party clients then that's basically going to be the end of Reddit.
Is reddit going to pay users? Or are they just going to collect the content generate by its users and then turn around and charge people to access it?<p>I think we all know that it's more column B than column A.<p>And while I'm not entirely comfortable with LLMs consuming all of that content without reimbursing the creators of that content. I don't see how Reddit charging for its API is different on any meaningful level.
So this is how SkyNet or the Matrix starts, I guess. Any AI trained using the content from Reddit would obviously conclude that humankind deserves to be eradicated.
/s
Reddit should consider paying its moderators. Or employ moderators who don't use their vast unchecked powers to astroturf the site on behalf of shadowy companies.
Totally understandable.<p>That said, on subreddits I see people who post content without attribution all the time. I recall in /r/aww you can't directly link to an Instagram post but you can "steal" the image and post it, and it's optional as to whether or not you link to the Instagram post within the comments. Likewise, people take videos from YouTube/TikTok and re-host it on Reddit.<p>In smaller subreddits people will post entire pay-walled articles as if writers only get paid in likes.
It's hilarious to see Reddit's inept attempts to monetize the content gold mine they've squandered after a decade of devaluing product and engineering.
ha-ha. Loving these double-faced stories here and there. “Crawling Reddit, generating value, and not returning any of that value to our users is something we have a problem with.” Very well Mr. Huffman, but what about “Posting on Reddit, generating free content which brings multi-hundred-million advertisement profits for the company, and not getting any of that value back is something which your users don't have the slightest problem with.“
The API is just a convenience to get the data, but surely you can get all the data you want without any additional API for free just by using their HTTP API - as any other generic user would do. Of course, filling up an enormous proxy well to avoid various ingenious "protections" could cost you some 10-20 bucks, and solving captchas automatically could cost you another 1$ for 1000, but from there, it's even easier and more enjoying to use than an API. I'm feeling like launching a scrape-it-all service to avoid greedy ip-protocol customs officers could be a profitable venture these days.
A sensible choice. Now only if open source developers would update their licenses, perhaps a new GPL license, to restrict reselling of IP through AI models. These folks need to adhere to rules if we are to have a healthy ecosystem.
The developer of the iOS app Apollo got some more details from Reddit today:<p><a href="https://redd.it/12ram0f" rel="nofollow">https://redd.it/12ram0f</a><p>> Reddit is moving to a paid API model for apps. The goal is not to make this inherently a big profit center, but to cover both the costs of usage, as well as the opportunity costs of users not using the official app (lost ad viewing, etc.)
I’m amazed they are willing to charge for their abomination of an API. The search functionality is terrible, returns unreliable results, and can only return 100 at once. I would happily pay for a great version of the Reddit API. I doubt anyone doing huge scraping jobs on Reddit is using their API to do so.
Seems like the downfall of Reddit is eminent between this decision and nerfing the mobile web experience for no good reason other than to vacuum up mobile user data. What do others here think?
The New York Times' style guide is starting to look pretty dated. This has gotta be the first time I've seen "L.L.M." as opposed to just "LLM".
ChatGPT is on a trajectory to overtake Reddit in popularity.<p>And every interaction from users with ChatGPT is valuable content provided to OpenAI.<p>Most people don't realize this, but every question contains information. When a user asks "Which city is better for digital nomads, Berlin or Lisbon?", they have given out a bunch of information. That there is something called "digital nomads". That there are cities called "Berlin" and "Lisbon". That those seem to be considered good for "digital nomads".<p>And even more so when the chat continues. If ChatGPT praises how nice a city is for studying and the users replies "I don't study. I need a cheap apartment with fast internet", the user provided information about the preferences of "digital nomads", that apartments can be cheap or expensive, that apartments have internet, that internet can be faster or slower.
"There’s a lot of stuff on the site that you’d only ever say in therapy" Yes that is indeed Reddit in a nutshell. May not want Reddit content in your next ChatGPT model, so not necessarily a bad thing.
i.reddit.com gone, they want to kill the awesome 3rd party apps next instead of improving theirs. They are definitely killing off 3rd party apps, my prediction is that it will be killed within an year.
Here’s an extension to delete all of your Reddit history: <a href="https://github.com/j0be/PowerDeleteSuite">https://github.com/j0be/PowerDeleteSuite</a>
Some more details here: <a href="https://www.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/" rel="nofollow">https://www.reddit.com/r/reddit/comments/12qwagm/an_update_r...</a>
Technically speaking, there's already a very competent open-source and federated replacement for Reddit: <a href="https://github.com/LemmyNet/lemmy">https://github.com/LemmyNet/lemmy</a><p>Socially speaking, perhaps not so much.
How do they plan to keep Google from using its search index of Reddit for training? Or keep OpenAI from using Common Crawl? Do they simply add "No AI" to their TOS?
Gift link for paywall bypass: <a href="https://www.nytimes.com/2023/04/18/technology/reddit-ai-openai-google.html?unlocked_article_code=TNoqm1jRuETVpKaEs3HNHf64jitU0Sh5ze4mNj4isRRrOLPVu33NR4pR_1Fur4qxMJyk2o1wxfnET3z81LARJCOj7GPD4hiF5Nc5-y2DtDg9mquujeImb-BSZld2SoGYtr8C1tj0JQbwTn3SsMHOcr2H-R5UCV-XaPOVwE_oIcScQe5-atz2kS9jS0x10BVdLD11KVkCOMVaf5SjB7NTeFTTEaMdWLqqSSS2yQAAPZSEWALpl2GPKyF2rXEmmFh05n8ubXj4oUeLLn-MQBLtS2nlywlW57bZHRp6lsbeSvFffH9cWVCaClzU2CMfi8eJ3UlScsvAm8zXxv6Lpf3_1WpzqIw&giftCopy=1_CurrentCopy&smid=url-share" rel="nofollow">https://www.nytimes.com/2023/04/18/technology/reddit-ai-open...</a>
Reddit already derived their remuneration from making their site public. Now they want paid again?<p>Maybe they should lock their entire site behind a paywall.
Interesting… i guess google wont be charged because of the backlinks but ChatGPT will be, because they just show an answer to one’s query and dont actually show any of the “original content” in context, and therefore no back-traffic for reddit.