If the moat to building chatGPT-like apps is training on up-to-date primary source data from sites like reddit, and the algorithms quickly become copied and commodified (like stable diffusion etc did to dall-e), doesn’t all the value accrue to primary information sources like reddit? They can sell access to their information and ban/sue anyone who tries to scrape their data wholesale, and essentially determine the best AI?
Personally I like the idea of something like chatgpt replacing google search.
But, and a huge but, I miss the sources its trained on. Are these copyrighted sources, who owns the rights to the content it spews out? I like to know whos content I am reading and or using (when opensource) to reference them when needed.