TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Anubis: Proof-of-work proxy to prevent AI crawlers

100 pointsby techknowlogickabout 2 months ago

16 comments

jchwabout 2 months ago
I&#x27;m really curious to see how this evolves as time goes on. Hashcash was originally conceived to stop e-mail SPAM, and a lot has changed since then, namely, compute has become absolutely dirt cheap. Despite that, PoW-based anti-bot remains somewhat enticing because it doesn&#x27;t <i>necessarily</i> harm accessibility the way that solutions like Cloudflare or reCAPTCHA can: It should be possible to pass, even on a VPN or Tor, even on less used web browsers like Ladybird or Servo, and even if you&#x27;re not on a super powerful device (provided you are willing to wait for the PoW challenge to pass, but as long as you don&#x27;t have all of these conditions at once you should get an &quot;easy&quot; challenge and it should be quick.)<p>The challenge is definitely figuring out if this solution actually works at scale or not. I&#x27;ve played around with an implementation of Hashcash myself, using WebCrypto, but I worry because even using WebCrypto it is <i>quite</i> a lot slower than cracking hashes in native code. But seeing Anubis seemingly have <i>some</i> success makes me hopeful. If it gains broad adoption, it might just be enough of a pain in the ass for scrapers, while still being possible for automation to pass provided they can pay the compute toll (e.g. hopefully anything that&#x27;s not terribly abusive.)<p>On a lighter note, I&#x27;ve found the reception of Anubis, and in particular the anime-style mascot, to be predictably amusing.<p><a href="https:&#x2F;&#x2F;discourse.gnome.org&#x2F;t&#x2F;anime-girl-on-gnome-gitlab&#x2F;27689" rel="nofollow">https:&#x2F;&#x2F;discourse.gnome.org&#x2F;t&#x2F;anime-girl-on-gnome-gitlab&#x2F;276...</a><p>(Note: I&#x27;d personally suggest not going and replying here. Don&#x27;t want to encourage brigading of any sort, just found this mildly amusing.)
评论 #43435443 未加载
kmeisthaxabout 2 months ago
Is there a way to alter text to poison AI training sets? I know there&#x27;s Glaze and Nightshade for images but I&#x27;ve heard of nothing to poison text models. To be clear, this wouldn&#x27;t be a defensive measure to stop scraping; it&#x27;d be an offensive honeypot: you&#x27;d want to make pages that have the same text but mutated slightly differently each time, so that AI scrapers preferentially load up on your statistically different text and then yield a poisoned model. Ideally the scraper companies will realize what&#x27;s going on and stop scraping.
评论 #43433787 未加载
评论 #43429238 未加载
评论 #43431331 未加载
评论 #43430258 未加载
评论 #43467697 未加载
avodonosovabout 2 months ago
Ideas:<p>- Make it generate cryptucurrency, so that the work is not wasted. Either to compensate for server expences hosting the content, or for some noble non-profit cause - all installations would collect the currency to a single account. Wasting the work is worse than these both options.<p>- An easy way for good crawlers (like internet archive) to authenticate themselves. E.g. TLS client side authentication or simply an HTTP request header containing signature for the request (the signature in the header may be based on, for example, on their domain name and the TLS cert for that domain)
评论 #43430035 未加载
评论 #43431355 未加载
评论 #43431602 未加载
评论 #43432164 未加载
Trung0246about 2 months ago
For no js solution, I think some sort of using optical illusion as captcha could works, especially <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Magic_Eye" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Magic_Eye</a> or something like <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Bg3RAI8uyVw" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Bg3RAI8uyVw</a> which could cleverly hide captcha answer within animated noise mess.<p>However these methods are not really accessibility-friendly tho.
nikisweetingabout 2 months ago
Doesn&#x27;t seem to noticably slow down my test bot. Headful crawling already takes ~10sec&#x2F;page so an extra 0.5sec is hardly that big a deal.
评论 #43434627 未加载
评论 #43428653 未加载
pvgabout 2 months ago
Discussion here <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43422929">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43422929</a>
yjftsjthsd-habout 2 months ago
&gt; to stop AI crawlers<p>It&#x27;ll do that too, but it&#x27;s really more of a general-purpose anti-bot, right? A generic PoW-wall.
评论 #43429092 未加载
barlogabout 2 months ago
When I visited Xe-sann&#x27;s page, I was curious to see Jackal-chann challenges in action. This is Anubis.
akoboldfryingabout 2 months ago
Regarding the problem of how to let &quot;good&quot; bots through:<p>You could use PKI: Drop the PoW if the client provides a TLS <i>client</i> certificate chain that asserts that &lt;publicKey&gt; corresponds to a private key that is controlled by &lt;fullNamesAndAddressesOfThesePeople&gt; (or just by, say, &lt;people WhoControlThisUrl&gt;, for Let&#x27;s Encrypt-style automatable cert signing). This would be a slight hassle for good bot operators to set up, but not a very big deal. The result is that bad bots couldn&#x27;t spoof good bots to get in.<p>(Obviously this strategy generalises to handling human users too -- but in that case, the loss of privacy, as well as admin inconvenience, makes it much less palatable.)
评论 #43429088 未加载
bno1about 2 months ago
What stops a scraper from detecting Anubis and just removing &quot;Mozilla&quot; from the user-agent string?
评论 #43433980 未加载
评论 #43430627 未加载
xg15about 2 months ago
It&#x27;s a great idea, but I fear if this keeps going viral like it did in the last few days, more bot authors will be motivated to add special handling for it and e.g change the user agent to a non-Mozilla one.
Trung0246about 2 months ago
The performance on mobile is kinda suck tho, took like 30 seconds to wait for PoW on difficulty 4 on Firefox Android. By that time I have to resist the urge to switch to do something else.
iszomerabout 2 months ago
This reminded me of an article I printed (yes, with paper) at my college more than 20 years ago, titled Parasitic Computing. I don&#x27;t remember where it was originally published but I do think I might have stumbled upon it via kuro5hin (maybe); a quick search resulted the publication from Nature (though paywalled).<p>- <a href="https:&#x2F;&#x2F;www.nature.com&#x2F;articles&#x2F;35091039" rel="nofollow">https:&#x2F;&#x2F;www.nature.com&#x2F;articles&#x2F;35091039</a>
Alifatiskabout 2 months ago
This is like wehatecaptchas.com
ranger_dangerabout 2 months ago
I would say it doesn&#x27;t prevent anything, it just makes computers warm the planet more.
评论 #43428435 未加载
评论 #43435661 未加载
评论 #43428821 未加载
drpossumabout 2 months ago
[flagged]
评论 #43429389 未加载