TechEcho

1 comment

cratermoonover 1 year ago

To the extent that robots.txt still works, see Block the Bots that Feed “AI” Models by Scraping Your Website: <a href="https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/" rel="nofollow noreferrer">https://neil-clarke.com/block-the-bots-that-feed-ai-models-b...</a><p>You can also block user agents directly, for example in nginx. <a href="https://www.xmodulo.com/block-specific-user-agents-nginx-web-server.html" rel="nofollow noreferrer">https://www.xmodulo.com/block-specific-user-agents-nginx-web...</a><p>The UAs I'm aware of:<p><pre><code> CCBot ChatGPT-User GPTBot </code></pre> Does anyone know of a resource that tracks these crawlers' UA strings?

It's time to brush up robots.txt

1 comment

It's time to brush up robots.txt

1 comment