TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Detecting Bots in Apache and Nginx Logs Using Python

66 pointsby marklitabout 8 years ago

4 comments

jonkneeabout 8 years ago
I've had solid success detecting bots with a really easy pattern--usage frequency. Humans don't make request after request for long periods of times, but bots almost all do. The time between requests is usually pretty consistent too, not a lot of humans wait X seconds between doing things. Or not take breaks (what are the odds a human has made a request every hour for 48 hours straight?).
评论 #13843294 未加载
languagehackerabout 8 years ago
I was hoping there would be some machine learning in here. This just seems to be cross referencing a couple of different data sources.
评论 #13840315 未加载
评论 #13840248 未加载
orfabout 8 years ago
Seems to be more &#x27;filtering access logs by a blacklist&#x27; than actually detecting bots.<p>I run a VPN through Hetzner, so requests from my IP are not a bot (I hope!). Really you want to look at the paths (filtering out all the &#x2F;w00tw00t requests) and the user agents above all, which the author touches on. However a whitelist approach is better than a blacklist IMO.<p>Also in the `in_block` you really want to hoist the `IPAddress(ip)` call out of the `any()` loop!
评论 #13840194 未加载
评论 #13844685 未加载
guillem_lefaitabout 8 years ago
You may also want to add the amazon IP ranges: <a href="http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;general&#x2F;latest&#x2F;gr&#x2F;aws-ip-ranges.html" rel="nofollow">http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;general&#x2F;latest&#x2F;gr&#x2F;aws-ip-ranges.h...</a>