科技回声

jonknee大约 8 年前

I've had solid success detecting bots with a really easy pattern--usage frequency. Humans don't make request after request for long periods of times, but bots almost all do. The time between requests is usually pretty consistent too, not a lot of humans wait X seconds between doing things. Or not take breaks (what are the odds a human has made a request every hour for 48 hours straight?).

评论 #13843294 未加载

languagehacker大约 8 年前

I was hoping there would be some machine learning in here. This just seems to be cross referencing a couple of different data sources.

评论 #13840315 未加载

评论 #13840248 未加载

orf大约 8 年前

Seems to be more 'filtering access logs by a blacklist' than actually detecting bots.<p>I run a VPN through Hetzner, so requests from my IP are not a bot (I hope!). Really you want to look at the paths (filtering out all the /w00tw00t requests) and the user agents above all, which the author touches on. However a whitelist approach is better than a blacklist IMO.<p>Also in the `in_block` you really want to hoist the `IPAddress(ip)` call out of the `any()` loop!

评论 #13840194 未加载

评论 #13844685 未加载

guillem_lefait大约 8 年前

You may also want to add the amazon IP ranges: <a href="http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html" rel="nofollow">http://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.h...</a>

Detecting Bots in Apache and Nginx Logs Using Python

4 条评论

Detecting Bots in Apache and Nginx Logs Using Python

4 条评论