I've had solid success detecting bots with a really easy pattern--usage frequency. Humans don't make request after request for long periods of times, but bots almost all do. The time between requests is usually pretty consistent too, not a lot of humans wait X seconds between doing things. Or not take breaks (what are the odds a human has made a request every hour for 48 hours straight?).
Seems to be more 'filtering access logs by a blacklist' than actually detecting bots.<p>I run a VPN through Hetzner, so requests from my IP are not a bot (I hope!). Really you want to look at the paths (filtering out all the /w00tw00t requests) and the user agents above all, which the author touches on. However a whitelist approach is better than a blacklist IMO.<p>Also in the `in_block` you really want to hoist the `IPAddress(ip)` call out of the `any()` loop!
You may also want to add the amazon IP ranges: <a href="http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html" rel="nofollow">http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.h...</a>