>It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on line<p>There's a significant mental leap here. "I block these IP to conserve my resources, therefore they belong to clueless or malicious organisations". It's wrong in both directions:<p>* I don't think Google, Bing and other crawlers are inherently malicious, and certainly not clueless. Search engines serve a very important role in the internet. Ditto archive.org, and probably dozens of other bots.<p>* IP based blocklists work well for honest bots (not malicious, or at least not illegal). Malicious bot operators just buy SIM cards and use regular mobile internet for the crawling (basically unblockable, because the IP may be renewed every day or every hour). And the really malicious actors use residential proxies, i.e. botnets that proxy traffic through normal users' computers. Anyway I wonder how many of those 56MM IP addresses are regular dynamically allocated consumer grade ISPs.<p>>1-5-2024<p>For the love of all that is holy, what is this date format.
After reading his three part multi-month series about how he can't set up a firewall, I don't think this guy is probably someone who should be providing any useful information on how to use the internet (or anything attached to it).
Using <a href="http://nginx.org/r/deny" rel="nofollow">http://nginx.org/r/deny</a> is a very inefficient way to block large number of IP/nets. It mentioned right in the documentation:<p>> In case of a lot of rules, the use of the ngx_http_geo_module module variables is preferable.
> I don't care.<p>Just do your own thing, learn from it, and repeat. That's all we can really want in our projects, on our limited lease on planet Earth. Kudos that you found something to work on.
There are some real bad actors behind IP blocks, or hosting providers that have no problem hosting them nor take actions on abuse reports. Referrer spamming, searching for vulnerabilities (some of them with very big URL list to try), misbehaving crawlers, or just plain DoS are some of the ways they may sites, specially the ones serving dynamic content. This space is usually fixed and used by servers, or VPNs exitpoits. Blocking all the blocks associated to their autonomous systems would avoid to put in the rules a lot of /24.<p>But then there are residential IP blocks, specially some with dynamic enough IPs or NATed ISPs. Some people of those blocks may have hostile or clueless behaviour, some may be used as proxy because malware or because they intentionally installed some of the residential proxy servers agents. There you may be blocking legitimate visitors, if a few clients of some ISP are very active you may end blocking a lot of innocent people. And, in this case too, you can target the IP blocks of its autonomous system if you feel that from there you only get bad traffic.<p>But in the end, is your site. you are free to decide to block what you understand that are bad neighbourhoods.
What is the issue they are trying to solve?<p>It seems to be a static site. Bots should cause only a neglible amount of traffic per month. My guess would be less than $1.<p>And aren't there free CDNs for static sites these days? I guess you can just push the whole frontent data (html+assets) into a public git repo, put it behind a github page with custom domain and call it a day?
I block as well with geoblocking and based on source behavior.<p>As long as you understand the limitations, ramifications, and futility in doing this, I have no problems with it as one of the many tactics to defend your footprint and de-noise your logs, etc.<p>It's a never ending endeavor and you will see the shifting attack sources from the baddies along with the games on stub blocks, prefix broker IP block swaps, and more.<p>Just know there is automation and horsepower into that entire attack infrastructure you can't possibly compete with but maybe you can mitigate with the limited time and resources you have and that will be enough to get you through.
Could it be that the slight delay between opening this page and my browser receiving the first bytes is nginx checking these 50 million IPs? How is this delay so small if there are really 50 million deny statements?<p>Is there a reason why they don't use a firewall?
Someone should tell this guy about bogans so he can block 500 million more IPs.<p>And if you want to do the same? For the love of god get a firewall and subscribe to some RBLs like a sane person.
I wouldn’t put something on the public internet without geoblocking China, Russia, and the UAE. You should too! Stop their bad behavior by removing them from the internet.
Blocking bots is always going to be an uphill battle. But if the owner is worried about wasting meagre resources, why not serving static HTML files instead of running a PHP server for a simple blog?
<i>I imagine that if this article makes its way onto Hacker News, I will be criticized.</i><p><i>Maybe they will call me naive or compare me to Don Quixote fighting windmills.</i><p><i>Maybe they will call me stupid or paranoid for not using some centralized block list.</i><p><i>Maybe they will object to how I characterize those who employ web-crawling robots.</i><p>Ha. He is so wrong. We're going to criticize his nginx config and his use of PHP.<p>I mean yeah. We'll probably get after that other stuff too but still.
> It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on line.<p>Oh, tell me, how much? The whopping $5/month? Oh, maybe this is a high load WordPress/like CMS running on LAMP stack... so $8/month?<p>> I wrote the following small PHP script to search though my Nginx configuration file and tally up the number of IP addresses that I am blocking.<p>Holy shit. Blocking bots through nginx configuration, more so, blocking 56M addresses through nginx configuration...<p>Okay, for those of you who never did the thing or have no idea:<p>Just use the firewall (most of the time it is built-in in your OS), use some way to tell the firewall about the 'offenders' (eg fail2ban though there are options) and don't ever block something indefinitely, it's totally meaningless, just use timeouts.<p>If some Bob got his computer infected in 2015 and that computer tried to access /wp-admin.php then there is absolutely no reason to assume what in 2024 the <i>IP address Bob's computer had in 2015</i> is still 'malicious'.<p>Automatic activity like the scans, bruteforcing and whatever is all about opportunity. They are searching for an easy opportunities to exploit and scanning a server what actively blocks you <i>even for 30m at time</i> is just pointless, there is way, way more opportunities in other places than wasting ~4 weeks trying to scan this server.<p>> I have custom 403 and 404 error pages that explain to those who may care why they are being blocked and how to regain access to the website<p><a href="https://cheapskatesguide.org/custom404really.html" rel="nofollow">https://cheapskatesguide.org/custom404really.html</a>