I have a URL shortener and I don't want to count visits by bots, is there a good comprehensive list of bots/crawlers?<p>Possible in a CSV format?<p>Either ip addresses, user-agents, or both.
Another option would be to use robots.txt to stop bots from accessing a particular URL (for example, an 1x1 image or somesuch). Hide that somewhere in every page, and only count visits where the image was shown.<p>This does require that the url expansion works as a display + redirect, so an intermediate page is shown. If it doesn't work like that...<p>Well, you can simply exclude the bots and crawlers with robots.txt. The downside of that is that then they won't index your shortened links either, which may or may not be a problem.
Lists like this aren't generally shared, because then the nefarious bots would know they had been caught.<p>Well behaved bots tend to use useragents that make themselves fairly obvious.<p>The best bet is to watch your logs for an IP or agent that seems to hit more URLs than anyone else, and then investigate by hand.