TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Is there a robots.txt equivalent for LLMs? like LICENSEME.txt?

4 pointsby devrobabout 1 year ago
If you run a blog and don't want to allow LLM crawlers to train on your content, do you have options?

2 comments

raxxorraxorabout 1 year ago
I guess if you selectively allow crawlers that promise to not use the data in such a way, robots.txt is still the way to go.<p>Otherwise you need to selectively allow certain bots. However, as well as with web crawlers, respecting a robots.txt is optional.<p>Insidious with AI-models is that it is difficult or practicably impossible to prove that it trained on your data.<p>Difficult to establish a standard like robots.txt. There also was .well-known&#x2F;security.txt that Google proposed. Some sites serve it, but it hasn&#x27;t really become a standard.
legrandeabout 1 year ago
Ironically my blog is written with the help of an LLM, so AI scraper bots are trained on their own output.<p>But if you are concerned there&#x27;s a good resource here for blocking them: <a href="https:&#x2F;&#x2F;darkvisitors.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;darkvisitors.com&#x2F;</a>