Now Blocking 56,037,235 IP Addresses, and Counting

43 pointsby devinjonover 1 year ago

19 comments

hifromworkover 1 year ago

>It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on lineThere's a significant mental leap here. "I block these IP to conserve my resources, therefore they belong to clueless or malicious organisations". It's wrong in both directions:* I don't think Google, Bing and other crawlers are inherently malicious, and certainly not clueless. Search engines serve a very important role in the internet. Ditto archive.org, and probably dozens of other bots.* IP based blocklists work well for honest bots (not malicious, or at least not illegal). Malicious bot operators just buy SIM cards and use regular mobile internet for the crawling (basically unblockable, because the IP may be renewed every day or every hour). And the really malicious actors use residential proxies, i.e. botnets that proxy traffic through normal users' computers. Anyway I wonder how many of those 56MM IP addresses are regular dynamically allocated consumer grade ISPs.>1-5-2024For the love of all that is holy, what is this date format.

评论 #38879286 未加载

评论 #38879177 未加载

评论 #38879418 未加载

评论 #38879302 未加载

评论 #38879263 未加载

评论 #38878995 未加载

Fb24kover 1 year ago

After reading his three part multi-month series about how he can't set up a firewall, I don't think this guy is probably someone who should be providing any useful information on how to use the internet (or anything attached to it).

评论 #38879470 未加载

devinjonover 1 year ago

"I imagine that if this article makes its way onto Hacker News, I will be criticized."

评论 #38879074 未加载

citrin_ruover 1 year ago

Using <a href="http://nginx.org/r/deny" rel="nofollow">http://nginx.org/r/deny</a> is a very inefficient way to block large number of IP/nets. It mentioned right in the documentation:> In case of a lot of rules, the use of the ngx_http_geo_module module variables is preferable.

turtleyachtover 1 year ago

> I don't care.Just do your own thing, learn from it, and repeat. That's all we can really want in our projects, on our limited lease on planet Earth. Kudos that you found something to work on.

gmusleraover 1 year ago

There are some real bad actors behind IP blocks, or hosting providers that have no problem hosting them nor take actions on abuse reports. Referrer spamming, searching for vulnerabilities (some of them with very big URL list to try), misbehaving crawlers, or just plain DoS are some of the ways they may sites, specially the ones serving dynamic content. This space is usually fixed and used by servers, or VPNs exitpoits. Blocking all the blocks associated to their autonomous systems would avoid to put in the rules a lot of /24.But then there are residential IP blocks, specially some with dynamic enough IPs or NATed ISPs. Some people of those blocks may have hostile or clueless behaviour, some may be used as proxy because malware or because they intentionally installed some of the residential proxy servers agents. There you may be blocking legitimate visitors, if a few clients of some ISP are very active you may end blocking a lot of innocent people. And, in this case too, you can target the IP blocks of its autonomous system if you feel that from there you only get bad traffic.But in the end, is your site. you are free to decide to block what you understand that are bad neighbourhoods.

TekMolover 1 year ago

What is the issue they are trying to solve?It seems to be a static site. Bots should cause only a neglible amount of traffic per month. My guess would be less than $1.And aren't there free CDNs for static sites these days? I guess you can just push the whole frontent data (html+assets) into a public git repo, put it behind a github page with custom domain and call it a day?

评论 #38879142 未加载

评论 #38880446 未加载

评论 #38879171 未加载

egorfineover 1 year ago

The approach is so wrong on so many levels I'm wondering if that's just a rage clickbait.

mergyover 1 year ago

I block as well with geoblocking and based on source behavior.As long as you understand the limitations, ramifications, and futility in doing this, I have no problems with it as one of the many tactics to defend your footprint and de-noise your logs, etc.It's a never ending endeavor and you will see the shifting attack sources from the baddies along with the games on stub blocks, prefix broker IP block swaps, and more.Just know there is automation and horsepower into that entire attack infrastructure you can't possibly compete with but maybe you can mitigate with the limited time and resources you have and that will be enough to get you through.

SushiHippieover 1 year ago

Could it be that the slight delay between opening this page and my browser receiving the first bytes is nginx checking these 50 million IPs? How is this delay so small if there are really 50 million deny statements?Is there a reason why they don't use a firewall?

评论 #38892496 未加载

1970-01-01over 1 year ago

It's a crazy way of staying online, but it does work, therefore it's not crazy.

gorkishover 1 year ago

Someone should tell this guy about bogans so he can block 500 million more IPs.And if you want to do the same? For the love of god get a firewall and subscribe to some RBLs like a sane person.

asylteltineover 1 year ago

I wouldn’t put something on the public internet without geoblocking China, Russia, and the UAE. You should too! Stop their bad behavior by removing them from the internet.

评论 #38883385 未加载

aledalgrandeover 1 year ago

Blocking bots is always going to be an uphill battle. But if the owner is worried about wasting meagre resources, why not serving static HTML files instead of running a PHP server for a simple blog?

6R1M0R4CL3over 1 year ago

well he should directly block everything and unblock those one by one. this would be more impressive to me :>

WarOnPrivacyover 1 year ago

I imagine that if this article makes its way onto Hacker News, I will be criticized.Maybe they will call me naive or compare me to Don Quixote fighting windmills.Maybe they will call me stupid or paranoid for not using some centralized block list.Maybe they will object to how I characterize those who employ web-crawling robots.Ha. He is so wrong. We're going to criticize his nginx config and his use of PHP.I mean yeah. We'll probably get after that other stuff too but still.

评论 #38879637 未加载

chupapimunyenyoover 1 year ago

I cant read this, im blocked

评论 #38879755 未加载

justsomehnguyover 1 year ago

> It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on line.Oh, tell me, how much? The whopping $5/month? Oh, maybe this is a high load WordPress/like CMS running on LAMP stack... so $8/month?> I wrote the following small PHP script to search though my Nginx configuration file and tally up the number of IP addresses that I am blocking.Holy shit. Blocking bots through nginx configuration, more so, blocking 56M addresses through nginx configuration...Okay, for those of you who never did the thing or have no idea:Just use the firewall (most of the time it is built-in in your OS), use some way to tell the firewall about the 'offenders' (eg fail2ban though there are options) and don't ever block something indefinitely, it's totally meaningless, just use timeouts.If some Bob got his computer infected in 2015 and that computer tried to access /wp-admin.php then there is absolutely no reason to assume what in 2024 the IP address Bob's computer had in 2015 is still 'malicious'.Automatic activity like the scans, bruteforcing and whatever is all about opportunity. They are searching for an easy opportunities to exploit and scanning a server what actively blocks you even for 30m at time is just pointless, there is way, way more opportunities in other places than wasting ~4 weeks trying to scan this server.> I have custom 403 and 404 error pages that explain to those who may care why they are being blocked and how to regain access to the website<a href="https://cheapskatesguide.org/custom404really.html" rel="nofollow">https://cheapskatesguide.org/custom404really.html</a>

评论 #38879738 未加载

WhereIsTheTruthover 1 year ago

Nobody own their IP, banning IPs means banning innocent people tomorrowYou should timeout IP for few days instead

19 comments

hifromworkover 1 year ago

>It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on lineThere's a significant mental leap here. "I block these IP to conserve my resources, therefore they belong to clueless or malicious organisations". It's wrong in both directions:* I don't think Google, Bing and other crawlers are inherently malicious, and certainly not clueless. Search engines serve a very important role in the internet. Ditto archive.org, and probably dozens of other bots.* IP based blocklists work well for honest bots (not malicious, or at least not illegal). Malicious bot operators just buy SIM cards and use regular mobile internet for the crawling (basically unblockable, because the IP may be renewed every day or every hour). And the really malicious actors use residential proxies, i.e. botnets that proxy traffic through normal users' computers. Anyway I wonder how many of those 56MM IP addresses are regular dynamically allocated consumer grade ISPs.>1-5-2024For the love of all that is holy, what is this date format.

评论 #38879286 未加载

评论 #38879177 未加载

评论 #38879418 未加载

评论 #38879302 未加载

评论 #38879263 未加载

评论 #38878995 未加载

Fb24kover 1 year ago

评论 #38879470 未加载

devinjonover 1 year ago

"I imagine that if this article makes its way onto Hacker News, I will be criticized."

评论 #38879074 未加载

citrin_ruover 1 year ago

turtleyachtover 1 year ago

> I don't care.Just do your own thing, learn from it, and repeat. That's all we can really want in our projects, on our limited lease on planet Earth. Kudos that you found something to work on.

gmusleraover 1 year ago

TekMolover 1 year ago

评论 #38879142 未加载

评论 #38880446 未加载

评论 #38879171 未加载

egorfineover 1 year ago

The approach is so wrong on so many levels I'm wondering if that's just a rage clickbait.

mergyover 1 year ago

SushiHippieover 1 year ago

评论 #38892496 未加载

1970-01-01over 1 year ago

It's a crazy way of staying online, but it does work, therefore it's not crazy.

gorkishover 1 year ago

Someone should tell this guy about bogans so he can block 500 million more IPs.And if you want to do the same? For the love of god get a firewall and subscribe to some RBLs like a sane person.

asylteltineover 1 year ago

I wouldn’t put something on the public internet without geoblocking China, Russia, and the UAE. You should too! Stop their bad behavior by removing them from the internet.

评论 #38883385 未加载

aledalgrandeover 1 year ago

Blocking bots is always going to be an uphill battle. But if the owner is worried about wasting meagre resources, why not serving static HTML files instead of running a PHP server for a simple blog?

6R1M0R4CL3over 1 year ago

well he should directly block everything and unblock those one by one. this would be more impressive to me :>

WarOnPrivacyover 1 year ago

评论 #38879637 未加载

chupapimunyenyoover 1 year ago

I cant read this, im blocked

评论 #38879755 未加载

justsomehnguyover 1 year ago

> It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on line.Oh, tell me, how much? The whopping $5/month? Oh, maybe this is a high load WordPress/like CMS running on LAMP stack... so $8/month?> I wrote the following small PHP script to search though my Nginx configuration file and tally up the number of IP addresses that I am blocking.Holy shit. Blocking bots through nginx configuration, more so, blocking 56M addresses through nginx configuration...Okay, for those of you who never did the thing or have no idea:Just use the firewall (most of the time it is built-in in your OS), use some way to tell the firewall about the 'offenders' (eg fail2ban though there are options) and don't ever block something indefinitely, it's totally meaningless, just use timeouts.If some Bob got his computer infected in 2015 and that computer tried to access /wp-admin.php then there is absolutely no reason to assume what in 2024 the IP address Bob's computer had in 2015 is still 'malicious'.Automatic activity like the scans, bruteforcing and whatever is all about opportunity. They are searching for an easy opportunities to exploit and scanning a server what actively blocks you even for 30m at time is just pointless, there is way, way more opportunities in other places than wasting ~4 weeks trying to scan this server.> I have custom 403 and 404 error pages that explain to those who may care why they are being blocked and how to regain access to the website<a href="https://cheapskatesguide.org/custom404really.html" rel="nofollow">https://cheapskatesguide.org/custom404really.html</a>

评论 #38879738 未加载

WhereIsTheTruthover 1 year ago

Nobody own their IP, banning IPs means banning innocent people tomorrowYou should timeout IP for few days instead