>Rates vary by buyer and content type, but Braga said companies are generally willing to pay $1 to $2 per image, $2 to $4 per short-form video and $100 to $300 per hour of longer films. The market rate for text is $0.001 per word, she added.<p>This is high enough that there should be a market to compensate the end users who created these
This market is troubling. But I have a different question:<p>What does the long game look like for raw training data? How will AIs maintain the quality of their diet?<p>To compare, web search started — in the early days of Google — as a huge win because so much valuable information that was scattered around became findable. But over time it has become whac-a-mole with spam and AI copypasta, and now it's a struggle to keep returning good results, for <i>any</i> search engine.
I wonder if they’ve considered hiring people to write. A lot of people might do it for cheap just to have their imprint on AI.<p>Or another twist pay people to submit ten years of emails (upload the backup file) or just pay small amounts for works they’ve made. College essays, journals, etc.
Companies like Quest Diagnostics (a lab testing firm) are sitting on a goldmine of clean data. It's only a matter of time before a firm like Amazon (who already bought One Medical) gobbles them up.<p>Disclaimer: Long on $DGX
>in talks with multiple tech companies to license Photobucket's 13 billion photos and videos<p>>Photobucket declined to identify its prospective buyers, citing commercial confidentiality.<p>>tech companies are also quietly paying for content locked behind paywalls and login screens, giving rise to a hidden trade in everything from chat logs to long forgotten personal photos from faded social media apps<p>In this market, ethics seem to exist when it comes to corporate clients, but not when it comes to end-users.<p>It's immediately and self-evidently obvious that no end-user in 2007 consented to photos of their 2007 era teenage self being used to train an AI how to identify an emo kid.
They talk about voice samples, but they don’t mention prices for them<p>Would it be attractive for a company like Twilio or Aircall to offer free phone calls and sell anonymized recordings?
I am incredibly thankful that I never used any of those services. I'm angry enough at the thought that my own websites may have been scraped to train LLMs, but at least I could remove that content. I'd be beside myself if I couldn't do at least that much.
No Datadome Javascript:<p><a href="https://www.usnews.com/news/top-news/articles/2024-04-05/inside-big-techs-underground-race-to-buy-ai-training-data" rel="nofollow">https://www.usnews.com/news/top-news/articles/2024-04-05/ins...</a>
I assume some of the more shady/no-name dashcam units with Wifi capability are uploading their video and internal microphone recordings. Distributed surveillance: The Panopitcar