科技回声

1 comment

User-agent aside there are usually small details bots leave out unless they are using headless chrome of course. Most bots can't do HTTP/2.0 yet all common browsers support it. Most bots will not be sending cors, no-cors, navigate sec_fetch_mode headers whereas browsers do. Some bots do not send a accept_language header. Those are just a few things one can look for and deal with in simple web server ACL's. Some bots do not support http-keepalive, though this can knock out some poor middle boxes if dropping connections that do not support http keepalive.<p>At the tcp layer some bots do not set MSS options or use very strange values. This can get into false positives so I just don't publish IPv6 records for my web servers and then limit to an MSS range of 1280 to 1460 on IPv4 which knocks out many bots.<p>There are always the possibilities of false positives but they can be logged and reviewed <i>acceptable losses</i> should the load on the servers get too high. Another mitigating control is to perform analysis on previous logs and use maps to exclude people that post on a regular basis or have logins to the site assuming none of them are part of the problem. If a registered user is part of the problem give them an error page after {n} requests.

Should you be wondering why LWN is occasionally sluggish

1 comment

Should you be wondering why LWN is occasionally sluggish

1 comment