TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Big Tech's underground race to buy AI training data

152 pointsby twilightzoneabout 1 year ago

16 comments

htrpabout 1 year ago
&gt;Rates vary by buyer and content type, but Braga said companies are generally willing to pay $1 to $2 per image, $2 to $4 per short-form video and $100 to $300 per hour of longer films. The market rate for text is $0.001 per word, she added.<p>This is high enough that there should be a market to compensate the end users who created these
评论 #39944498 未加载
评论 #39943746 未加载
评论 #39945452 未加载
评论 #39944423 未加载
评论 #39944663 未加载
评论 #39944584 未加载
评论 #39946927 未加载
neoleftyabout 1 year ago
This market is troubling. But I have a different question:<p>What does the long game look like for raw training data? How will AIs maintain the quality of their diet?<p>To compare, web search started — in the early days of Google — as a huge win because so much valuable information that was scattered around became findable. But over time it has become whac-a-mole with spam and AI copypasta, and now it&#x27;s a struggle to keep returning good results, for <i>any</i> search engine.
评论 #39944433 未加载
bilsbieabout 1 year ago
I wonder if they’ve considered hiring people to write. A lot of people might do it for cheap just to have their imprint on AI.<p>Or another twist pay people to submit ten years of emails (upload the backup file) or just pay small amounts for works they’ve made. College essays, journals, etc.
评论 #39945048 未加载
评论 #39945971 未加载
评论 #39949082 未加载
评论 #39945070 未加载
评论 #39945183 未加载
评论 #39944918 未加载
评论 #39958774 未加载
layer8about 1 year ago
This will be a fun reminiscence once we find out how humans are able to learn with just a tiny fraction of that data volume.
评论 #39944110 未加载
评论 #39944132 未加载
评论 #39944076 未加载
评论 #39946402 未加载
评论 #39944247 未加载
评论 #39944514 未加载
评论 #39944114 未加载
cdmeabout 1 year ago
All the more reason for comprehensive privacy&#x2F;data protection legislation and a refusal to provide data to these companies wherever possible.
评论 #39944872 未加载
1024coreabout 1 year ago
Companies like Quest Diagnostics (a lab testing firm) are sitting on a goldmine of clean data. It&#x27;s only a matter of time before a firm like Amazon (who already bought One Medical) gobbles them up.<p>Disclaimer: Long on $DGX
Shrezzingabout 1 year ago
&gt;in talks with multiple tech companies to license Photobucket&#x27;s 13 billion photos and videos<p>&gt;Photobucket declined to identify its prospective buyers, citing commercial confidentiality.<p>&gt;tech companies are also quietly paying for content locked behind paywalls and login screens, giving rise to a hidden trade in everything from chat logs to long forgotten personal photos from faded social media apps<p>In this market, ethics seem to exist when it comes to corporate clients, but not when it comes to end-users.<p>It&#x27;s immediately and self-evidently obvious that no end-user in 2007 consented to photos of their 2007 era teenage self being used to train an AI how to identify an emo kid.
评论 #39944355 未加载
评论 #39944326 未加载
评论 #39944437 未加载
评论 #39945005 未加载
评论 #39944834 未加载
nicoabout 1 year ago
They talk about voice samples, but they don’t mention prices for them<p>Would it be attractive for a company like Twilio or Aircall to offer free phone calls and sell anonymized recordings?
评论 #39944368 未加载
评论 #39943985 未加载
评论 #39944215 未加载
asattarmdabout 1 year ago
Google having so many private photos in Google Photos must be a goldmine for them.
评论 #39944081 未加载
评论 #39944181 未加载
JohnFenabout 1 year ago
I am incredibly thankful that I never used any of those services. I&#x27;m angry enough at the thought that my own websites may have been scraped to train LLMs, but at least I could remove that content. I&#x27;d be beside myself if I couldn&#x27;t do at least that much.
1vuio0pswjnm7about 1 year ago
No Datadome Javascript:<p><a href="https:&#x2F;&#x2F;www.usnews.com&#x2F;news&#x2F;top-news&#x2F;articles&#x2F;2024-04-05&#x2F;inside-big-techs-underground-race-to-buy-ai-training-data" rel="nofollow">https:&#x2F;&#x2F;www.usnews.com&#x2F;news&#x2F;top-news&#x2F;articles&#x2F;2024-04-05&#x2F;ins...</a>
xnxabout 1 year ago
I assume some of the more shady&#x2F;no-name dashcam units with Wifi capability are uploading their video and internal microphone recordings. Distributed surveillance: The Panopitcar
评论 #39945308 未加载
评论 #39944700 未加载
评论 #39944697 未加载
spxneoabout 1 year ago
Nobody&#x27;s going to mention Worldcoin?
评论 #39944630 未加载
sylwareabout 1 year ago
I wonder when one of the richest corps will manage to get exclusive access to such data and lock out the others.
评论 #39944783 未加载
ganzuulabout 1 year ago
GDPR covered data should be worth a lot less.
评论 #39945359 未加载
mostlysimilarabout 1 year ago
Who could have guessed giving away all of our data to corporations wholly focused on profit would be a bad thing?
评论 #39944093 未加载