I was hoping this would be about putting an orphan path in your robots.txt and then black-listing clients who tried to fetch it -- nobody should know about it except robots who are told not to go there, so anyone who visits the link is an adversary.
Honestly, I think it couldn't hurt, <i>if done appropriately</i>. If crawlers are indexing those pages, then they're publicly available anyway, and could be crawled by a determined attacker - so nothing in robots.txt ought to be truly sensitive. But if there's pages that <i>ought</i> to be secure, but might contain an exploitable vulnerability, putting their path in robots.txt at least limits their exposure to those determined enough to look, rather than any lazy script kiddie using Google to search your site.<p>Obviously you shouldn't <i>rely</i> on it, but defense in depth as always.
<i>It would even be a million times better to place the sensitive files inside /TOP_SECRET_FOLDER and disallow the entire path, avoiding to explicitly name the paths at least.</i><p>This is the only way to use robots.txt for semi-sensitive info, and obviously not for info so sensitive that it would be awful for it to get out. URLs can leak through proxy logs and shared browser history.
Just put a fake /wp/admin/login URL (or similar) in the disallow rules, then just IP ban everyone trying to access it for 24 hours. That's how you do robots.txt Security.
Why in the world would you put things that shouldn't be downloaded ON THE INTERNET to begin with.. If you then proceed to also tell the whole wide world that you did it.. It's difficult to feel any empathy.
You can also use robots.txt as a honeypot. Simply add some realistic looking url's to the Disallow pattern and create rules in haproxy, nginx, or your own custom scripts to catch anyone hitting those URL's and put them in a hamster wheel. i.e. give then a static "Having problems?" page, or just outright block them. On my own personal systems, I use "silent-drop" with a stick table in haproxy.
On a similar note, tools like [grabsite](<a href="https://github.com/ArchiveTeam/grab-site" rel="nofollow">https://github.com/ArchiveTeam/grab-site</a>) wisely use robots.txt as a method of finding additional paths to archive when crawling sites.
Storytime!<p>In my ex-job we were developing e-commerce system, which was super-old|big|messy 0-test PHP trash. After two years of actively working on it, I still couldn't form a clear picture about the details of its subsystems in my head.<p>One day there is a call from a client, saying that he is missing many of his orders. The whole company is on its feet and we are searching for what went wrong. We are examining the server logs just to find out that someone is making thousands of requests to our admin section and tries to linearly increment order IDs in the URL. Definitely some kind of attack.. Our servers are managed by different company so we are opening a ticket to blacklist that IP. Quick search told me that the requests are coming from the AWS servers, and the IP leads me to an issue on GitHub for some nginx "bad bots" blocking plugin, saying that this thing is called Maui bot and we are not the first one experiencing it. Nice. Anyway, this thing is still deleting our data and we can't even turn off the servers because of SLAs and how the system was architected. So we are trying to find out how is it even possible, that unauthorized request can delete our data. We are examining our auth module, but everything looks right. If you are not logged in and visit the order detail (for example) you are correctly redirected to login screen. So how? We are reading the documentation of the web framework that the application is using. There is it. $this->redirect('login');. According to the documentation it was missing return before that statement. So without the return, everything after that point was still executed. And "everything" in our case, was the action from the URL. No one ever noticed, because there were no tests, and when you tried it in the browser, you were "correctly" presented with the login screen. Unfortunately, with side-effect..
Guy that wrote that line did it 5-6 years before this incident, and was out of the company for many years even before I joined. I don't blame him..<p>Fix. Push. Deploy.
No more deleted orders.<p>POST MORTEM:<p>The Maui bot went straight to the disallowed /admin paths in robots.txt and tried to increment numbers (IDs) in paths.<p>I remember, that because Maui bot actions were (to the system) indistinguishable from the normal user actions, someone had to manually fix the orders in the database just by using server logs and comparing them somehow.<p>Sorry for my English, and yeah, (obviously) don't use robots.txt as security measure of any kind...
i move my not-to-be-indexed stuff around a lot: renaming, archiving, etc. & i've a bit of shell scripting and a commonlisp program that automatically add things to robots.txt so that barely any of the things listed won't 404, and the ones that won't are protected via htaccess.<p>not sure why i did this aside from that it was fun!