TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google cut a deal with Reddit for AI training data

165 pointsby bx376over 1 year ago

29 comments

steveBK123over 1 year ago
$60M&#x2F;year for GOOG to access all their data when they purport to be targeting a $5B valuation at IPO is really cheap.<p>Arguably Reddit&#x27;s value is it&#x27;s data, and GOOG is renting it for 1.2%&#x2F;year?
评论 #39472798 未加载
评论 #39472774 未加载
评论 #39472620 未加载
评论 #39472632 未加载
评论 #39472997 未加载
评论 #39473637 未加载
评论 #39472921 未加载
jiveturkey42over 1 year ago
AI is about to become really sarcastic, pedantic, and absurdly moralistic
评论 #39473371 未加载
评论 #39473037 未加载
评论 #39472607 未加载
评论 #39474852 未加载
评论 #39478724 未加载
jpalawagaover 1 year ago
I know reddit has hired a lot. This sort of feel like a fundraising round to keep operations afloat.<p>This must be the real reason for destroying public API access.
评论 #39473132 未加载
评论 #39473612 未加载
Mo3over 1 year ago
Just a reminder this exists:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;j0be&#x2F;PowerDeleteSuite">https:&#x2F;&#x2F;github.com&#x2F;j0be&#x2F;PowerDeleteSuite</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;andrewbanchich&#x2F;shreddit">https:&#x2F;&#x2F;github.com&#x2F;andrewbanchich&#x2F;shreddit</a>
评论 #39473836 未加载
klipklopover 1 year ago
What incentive is there for posters on Reddit to continue to produce new content that is sold off for a profit? Sure they agreed to it in the TOS, but I suspect people might leave the platform.<p>Also how many of the posts are already generated by AI? Seems like a raw deal to pay for data that could be a large percentage of bot activity.<p>Why visit Reddit at all for information when you can just ask Gemini.
lolpandaover 1 year ago
Reddit blocks all search engines from crawling their comments, essentially blocking all user generated contents.<p>See this directive in <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;robots.txt" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;robots.txt</a> `Disallow: &#x2F;r&#x2F;<i>&#x2F;comments&#x2F;</i>&#x2F;<i>&#x2F;</i>&#x2F;*`<p>But I can still search Reddit on Google. How does Google manage to get the data?
评论 #39473777 未加载
评论 #39474879 未加载
huitzitziltzinover 1 year ago
$60 M for absolutely cutting edge dick joke generation in next gen LLMs.<p>I question whether Reddit data is worth anything at all.
评论 #39473596 未加载
评论 #39474343 未加载
lijokover 1 year ago
On one hand a measly 60 mill for Reddits data<p>On the other hand they bought a poison pill<p>Is whoever behind the decision making here blindsided by Reddit being a massive trove of data, and are not realizing most of that data is shit?
评论 #39473362 未加载
PedroBatistaover 1 year ago
Only $60M? Damn.. Reddit is for the streets.<p>And that&#x27;s a sweet deal for Google, paying pennies to partially patch their deteriorating reputation about their search engine becoming useless.
评论 #39473590 未加载
iteratethisover 1 year ago
When I first learned that AI companies are vacuuming up all internet content without any regard for permission, attribution or compensation for the content creator, I found that deeply immoral.<p>I figured they should pay for it. But now that they do (in this instance), I&#x27;m realizing this might be even worse. They can just buy the entire market in the same way they buy Google search users by paying Apple billions a year.<p>And still the actual content creator, a Reddit user in this case, is not compensated.<p>It&#x27;s truly wild how lax regulation is. This is probably the most important technology ever created and we just let 2 companies have it all: the data and the compute.
评论 #39475021 未加载
评论 #39474651 未加载
LorenDBover 1 year ago
Hm, now I have to wonder if it&#x27;d be possible to create a tool that poisons your Reddit posts so they look fine to a human but completely trash LLM output.<p>I&#x27;m not saying we should do it, but it&#x27;s a fascinating thought.
评论 #39472466 未加载
评论 #39473915 未加载
评论 #39473101 未加载
评论 #39473212 未加载
izydaover 1 year ago
The lack of revocability, marginal temporal value, and downstream governance I think makes the prospect of more such data deals happening slim -- or at least, slim without regret.<p>I wrote an essay on this here: <a href="https:&#x2F;&#x2F;magis.substack.com&#x2F;p&#x2F;llm-data-sales-a-market-for-lemons" rel="nofollow">https:&#x2F;&#x2F;magis.substack.com&#x2F;p&#x2F;llm-data-sales-a-market-for-lem...</a>
gremlinsincover 1 year ago
Do people still use reddit? I haven&#x27;t since RelayForReddit went subscription only thanks to the new Reddit API rules.
rebeldeover 1 year ago
Google is paying Reddit instead of just taking it for free like they do from all other websites?
评论 #39473793 未加载
purpleblueover 1 year ago
Someone should create a bot that responds to highly upvoted comments and remind the redditor that Reddit is making money off the comment and the information is being used by AI and they aren&#x27;t seeing a dime of it.
评论 #39472497 未加载
评论 #39472613 未加载
评论 #39472768 未加载
评论 #39473244 未加载
crazysimover 1 year ago
Amazing. It used to be a free BigQuery dataset. Apparently it was worth $60M.
RecycledEleover 1 year ago
I hope they get all versions of edited and moderated comments. An AI might be able to determine when the moderation was valid and when it was playing games.<p>I would love to see a leak of that dataset, sorted my username.
jfghiover 1 year ago
I wonder how well they identify posts written via AI (and what proportion of posts that is). Also, there is a ton of misleading astroturfing for some types of businesses.
mediumsmartover 1 year ago
oh cool so the contributors of the data are all getting paid?
评论 #39472708 未加载
评论 #39472508 未加载
评论 #39473290 未加载
评论 #39472760 未加载
评论 #39472755 未加载
评论 #39473175 未加载
评论 #39472625 未加载
评论 #39472492 未加载
Scoundrellerover 1 year ago
Reddit should be paying Google.<p>Some of Reddit’s success can be attributed to the difficulty of getting your personal blog indexed about your ‘87 Nissan Sentra anymore on Google.
评论 #39473326 未加载
ChrisArchitectover 1 year ago
[dupe]<p>Discussion on official post here: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39471317">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39471317</a>
Fervicusover 1 year ago
An AI company with an agenda getting their data from a social platform with an agenda. What could go wrong?
评论 #39473214 未加载
xystover 1 year ago
Reddit is really pumping their P&amp;L sheet in preparation for IPO.<p>The parasites at the top have $ in their eyes rn
silisiliover 1 year ago
I hope they&#x27;re filtering it heavily and specifically. While Reddit does still have some valuable discussion in the more niche subs, I&#x27;ve noticed the main subs moving further left, hatefully and almost violently so.<p>I can&#x27;t wait to search for who the Republican nominee is, and Google tell me to kill myself.
评论 #39473202 未加载
rldjbpinover 1 year ago
i like how one half of the discussion is about getting your share of pay from the deal, while the other disregards whatever is posted on the site.
monkeydustover 1 year ago
This can&#x27;t be exclusive for that amount or could it?
评论 #39472807 未加载
whoopdedoover 1 year ago
&gt; Google Search is currently expanding the test of a &quot;forums&quot; filter that lets you browse through results from sites with human discussion, like Reddit<p>Yeah, about that whole &quot;human discussion&quot; thing...
评论 #39472512 未加载
评论 #39472442 未加载
评论 #39472808 未加载
评论 #39474489 未加载
评论 #39473411 未加载
评论 #39473353 未加载
7373737373over 1 year ago
Copying this right from the Reddit TOS for all those who consider posting anything of any value there, present or future:<p>&gt; You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:<p>&gt; When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.<p>(<a href="https:&#x2F;&#x2F;www.redditinc.com&#x2F;policies&#x2F;user-agreement-february-15-2024" rel="nofollow">https:&#x2F;&#x2F;www.redditinc.com&#x2F;policies&#x2F;user-agreement-february-1...</a>)<p>tl;dr: &quot;We will be able to do anything we fucking want with anything you contribute to our site&quot;
we_love_idfover 1 year ago
This is dystopian. A company like Google will be able to surveil more and the only outcome is more invasion of privacy and more targeted ads. Why doesn&#x27;t FTC stand up for this?
评论 #39473428 未加载