TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Big Tech's underground race to buy AI training data

152 点作者 twilightzone大约 1 年前

16 条评论

htrp大约 1 年前
&gt;Rates vary by buyer and content type, but Braga said companies are generally willing to pay $1 to $2 per image, $2 to $4 per short-form video and $100 to $300 per hour of longer films. The market rate for text is $0.001 per word, she added.<p>This is high enough that there should be a market to compensate the end users who created these
评论 #39944498 未加载
评论 #39943746 未加载
评论 #39945452 未加载
评论 #39944423 未加载
评论 #39944663 未加载
评论 #39944584 未加载
评论 #39946927 未加载
neolefty大约 1 年前
This market is troubling. But I have a different question:<p>What does the long game look like for raw training data? How will AIs maintain the quality of their diet?<p>To compare, web search started — in the early days of Google — as a huge win because so much valuable information that was scattered around became findable. But over time it has become whac-a-mole with spam and AI copypasta, and now it&#x27;s a struggle to keep returning good results, for <i>any</i> search engine.
评论 #39944433 未加载
bilsbie大约 1 年前
I wonder if they’ve considered hiring people to write. A lot of people might do it for cheap just to have their imprint on AI.<p>Or another twist pay people to submit ten years of emails (upload the backup file) or just pay small amounts for works they’ve made. College essays, journals, etc.
评论 #39945048 未加载
评论 #39945971 未加载
评论 #39949082 未加载
评论 #39945070 未加载
评论 #39945183 未加载
评论 #39944918 未加载
评论 #39958774 未加载
layer8大约 1 年前
This will be a fun reminiscence once we find out how humans are able to learn with just a tiny fraction of that data volume.
评论 #39944110 未加载
评论 #39944132 未加载
评论 #39944076 未加载
评论 #39946402 未加载
评论 #39944247 未加载
评论 #39944514 未加载
评论 #39944114 未加载
cdme大约 1 年前
All the more reason for comprehensive privacy&#x2F;data protection legislation and a refusal to provide data to these companies wherever possible.
评论 #39944872 未加载
1024core大约 1 年前
Companies like Quest Diagnostics (a lab testing firm) are sitting on a goldmine of clean data. It&#x27;s only a matter of time before a firm like Amazon (who already bought One Medical) gobbles them up.<p>Disclaimer: Long on $DGX
Shrezzing大约 1 年前
&gt;in talks with multiple tech companies to license Photobucket&#x27;s 13 billion photos and videos<p>&gt;Photobucket declined to identify its prospective buyers, citing commercial confidentiality.<p>&gt;tech companies are also quietly paying for content locked behind paywalls and login screens, giving rise to a hidden trade in everything from chat logs to long forgotten personal photos from faded social media apps<p>In this market, ethics seem to exist when it comes to corporate clients, but not when it comes to end-users.<p>It&#x27;s immediately and self-evidently obvious that no end-user in 2007 consented to photos of their 2007 era teenage self being used to train an AI how to identify an emo kid.
评论 #39944355 未加载
评论 #39944326 未加载
评论 #39944437 未加载
评论 #39945005 未加载
评论 #39944834 未加载
nico大约 1 年前
They talk about voice samples, but they don’t mention prices for them<p>Would it be attractive for a company like Twilio or Aircall to offer free phone calls and sell anonymized recordings?
评论 #39944368 未加载
评论 #39943985 未加载
评论 #39944215 未加载
asattarmd大约 1 年前
Google having so many private photos in Google Photos must be a goldmine for them.
评论 #39944081 未加载
评论 #39944181 未加载
JohnFen大约 1 年前
I am incredibly thankful that I never used any of those services. I&#x27;m angry enough at the thought that my own websites may have been scraped to train LLMs, but at least I could remove that content. I&#x27;d be beside myself if I couldn&#x27;t do at least that much.
1vuio0pswjnm7大约 1 年前
No Datadome Javascript:<p><a href="https:&#x2F;&#x2F;www.usnews.com&#x2F;news&#x2F;top-news&#x2F;articles&#x2F;2024-04-05&#x2F;inside-big-techs-underground-race-to-buy-ai-training-data" rel="nofollow">https:&#x2F;&#x2F;www.usnews.com&#x2F;news&#x2F;top-news&#x2F;articles&#x2F;2024-04-05&#x2F;ins...</a>
xnx大约 1 年前
I assume some of the more shady&#x2F;no-name dashcam units with Wifi capability are uploading their video and internal microphone recordings. Distributed surveillance: The Panopitcar
评论 #39945308 未加载
评论 #39944700 未加载
评论 #39944697 未加载
spxneo大约 1 年前
Nobody&#x27;s going to mention Worldcoin?
评论 #39944630 未加载
sylware大约 1 年前
I wonder when one of the richest corps will manage to get exclusive access to such data and lock out the others.
评论 #39944783 未加载
ganzuul大约 1 年前
GDPR covered data should be worth a lot less.
评论 #39945359 未加载
mostlysimilar大约 1 年前
Who could have guessed giving away all of our data to corporations wholly focused on profit would be a bad thing?
评论 #39944093 未加载