We have all seen the recent large deals made by tech companies to purchase access to various types of data for training their models (or Reddit, Photobucket). I have also seen some articles about the industry’s ever growing need for unique media and data that seem to suggest the existence of a market and brokers in need of new sources that are not online. They seem willing to pay, but I don’t see an obvious way to sell.<p>I believe I have access to troves that have never and will never be online. Some quick research has not turned up any obvious marketplace online or who to talk to.<p>Is anyone here in this business or have any advice or resources for people like me who want to explore offering training data for sale or license?
Related today:<p><i>Cloudflare's new marketplace lets websites charge AI bots for scraping</i><p><a href="https://news.ycombinator.com/item?id=41625903">https://news.ycombinator.com/item?id=41625903</a>
The sales process is the same as with any other b2b product. You need to figure out its value and customers.<p>And make sure you're confident about the value. For example, in many workflows having only 10% coverage of the population makes the data useless.<p>I wouldn't worry about the licensing details as a startup. It won't matter until you can afford lawyers and reputational damage for pursuing someone who's broken the license.