TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

It's time to brush up robots.txt

4 pointsby greatNespressoover 1 year ago

1 comment

cratermoonover 1 year ago
To the extent that robots.txt still works, see Block the Bots that Feed “AI” Models by Scraping Your Website: <a href="https:&#x2F;&#x2F;neil-clarke.com&#x2F;block-the-bots-that-feed-ai-models-by-scraping-your-website&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;neil-clarke.com&#x2F;block-the-bots-that-feed-ai-models-b...</a><p>You can also block user agents directly, for example in nginx. <a href="https:&#x2F;&#x2F;www.xmodulo.com&#x2F;block-specific-user-agents-nginx-web-server.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.xmodulo.com&#x2F;block-specific-user-agents-nginx-web...</a><p>The UAs I&#x27;m aware of:<p><pre><code> CCBot ChatGPT-User GPTBot </code></pre> Does anyone know of a resource that tracks these crawlers&#x27; UA strings?