It's worth noting that in this case "blocking" means "asking nicely for it not to index them" - so how effective this is depends on how well behaved the bots are.<p>There is a danger though if certain types of sites are more likely to block GPTBot than others, because that would end up skewing the data set that it trains off, which could have longer term impacts on all the content generated with it. If all the good quality sites block it and the sites full of AI generated junk don't, then that sounds like a downward spiral.
This is not a problem. They will just buy the data in bulk from some third party, what would do scraping for them.<p>I heard many instances of such things.
I don't own any revenue generating websites, but I do feel like the content I create is useful. I'd rather have GPT slurp it up with the hope of some small piece of me being emitted now or in the future for others to benefit from.<p>I'm sure my perspective would be different if I was paying my employees to create unique content for our brand, though.<p>I'm not sure though. At the end of the day, I think I'd rather information to be free. But that's not a sustainable model in many industries.
I would love to sit down with any number of 10 random website owners / managers from this list and ask them the following questions:<p>- Why did you block GPTBot?<p>- Are you aware that your content is scraped, directly copied and otherwise repurposed by other website that don't block GPTBot?<p>- What are your plans if in future iterations of the GPT model you're going to see that the GPT model has information that you wrote or produced? Are you going to fight it, and if so - how are you going to do that?<p>I think these are legitimate questions and they are the ones that I would love to hear answers to because I would love nothing more than OpenAI being hamstrung based on the bullshit that they pulled last year with ChatGPT.<p>Never forget that OpenAI stole the web and has had $11.3B in funding[0] and is seeking another round to place it at a $80-90 billion valuation[1].<p>[0]: <a href="https://www.crunchbase.com/organization/openai/company_financials" rel="nofollow noreferrer">https://www.crunchbase.com/organization/openai/company_finan...</a><p>[1]: <a href="https://techcrunch.com/2023/09/26/openai-is-reportedly-raising-funds-at-a-valuation-of-80-billion-to-90-billion/" rel="nofollow noreferrer">https://techcrunch.com/2023/09/26/openai-is-reportedly-raisi...</a>
As long as ChatGPT offers few or no financial incentives to creators of media, this percentage will increase.<p>As they are at the moment, OpenAI are parasites.
I really don't understand why sites would do this. To each their own, but it currently lowers my opinion of the site. I was disappointed to see NPR and Ars on the list.
OpenAI tries to set a precedence for default approval for crawling and training AIs with copyrighted content. Compared to search crawling it doesn't proove to offer anything in return<p>More:
<a href="https://tomaszs2.medium.com/ai-may-pirate-music-and-movies-1e931402bd20" rel="nofollow noreferrer">https://tomaszs2.medium.com/ai-may-pirate-music-and-movies-1...</a>