TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: A good list of bots/crawlers ip addresses?

2 pointsby rksprstover 14 years ago
I have a URL shortener and I don't want to count visits by bots, is there a good comprehensive list of bots/crawlers?<p>Possible in a CSV format?<p>Either ip addresses, user-agents, or both.

2 comments

madhouseover 14 years ago
Another option would be to use robots.txt to stop bots from accessing a particular URL (for example, an 1x1 image or somesuch). Hide that somewhere in every page, and only count visits where the image was shown.<p>This does require that the url expansion works as a display + redirect, so an intermediate page is shown. If it doesn't work like that...<p>Well, you can simply exclude the bots and crawlers with robots.txt. The downside of that is that then they won't index your shortened links either, which may or may not be a problem.
jedbergover 14 years ago
Lists like this aren't generally shared, because then the nefarious bots would know they had been caught.<p>Well behaved bots tend to use useragents that make themselves fairly obvious.<p>The best bet is to watch your logs for an IP or agent that seems to hit more URLs than anyone else, and then investigate by hand.