TechEcho

jonkneeabout 8 years ago

I've had solid success detecting bots with a really easy pattern--usage frequency. Humans don't make request after request for long periods of times, but bots almost all do. The time between requests is usually pretty consistent too, not a lot of humans wait X seconds between doing things. Or not take breaks (what are the odds a human has made a request every hour for 48 hours straight?).

评论 #13843294 未加载

languagehackerabout 8 years ago

I was hoping there would be some machine learning in here. This just seems to be cross referencing a couple of different data sources.

评论 #13840315 未加载

评论 #13840248 未加载

orfabout 8 years ago

Seems to be more 'filtering access logs by a blacklist' than actually detecting bots.<p>I run a VPN through Hetzner, so requests from my IP are not a bot (I hope!). Really you want to look at the paths (filtering out all the /w00tw00t requests) and the user agents above all, which the author touches on. However a whitelist approach is better than a blacklist IMO.<p>Also in the `in_block` you really want to hoist the `IPAddress(ip)` call out of the `any()` loop!

评论 #13840194 未加载

评论 #13844685 未加载

guillem_lefaitabout 8 years ago

You may also want to add the amazon IP ranges: <a href="http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html" rel="nofollow">http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.h...</a>

Detecting Bots in Apache and Nginx Logs Using Python

4 comments

Detecting Bots in Apache and Nginx Logs Using Python

4 comments