To be fair there's tens of thousands of content farms filling the web with ai slop. That's far more likely to harm AI scrapers than these hijinks.<p>Most crawlers use some form of timeout mechanism, usually informed by some priority scheduling. This deals reasonably well with crawler traps.<p>Since Nephentes-like traps are getting so common now (and in particular, not always behind robots.txt), I added a clause to Marginalia's crawler that prevents it from extracting links from pages that are less than 2 Kb and take more than 9 seconds to load. It's 4 lines of code and means the crawler doesn't get stuck at all.<p>I totally get the frustration though. My sites get an insane amount of bot traffic as well. I think roughly 1% of the search traffic to the html endpoint is human, and that's while providing a free API they could use instead. ... I just don't think this is going to fix anything.