I'm really curious to see how this evolves as time goes on. Hashcash was originally conceived to stop e-mail SPAM, and a lot has changed since then, namely, compute has become absolutely dirt cheap. Despite that, PoW-based anti-bot remains somewhat enticing because it doesn't <i>necessarily</i> harm accessibility the way that solutions like Cloudflare or reCAPTCHA can: It should be possible to pass, even on a VPN or Tor, even on less used web browsers like Ladybird or Servo, and even if you're not on a super powerful device (provided you are willing to wait for the PoW challenge to pass, but as long as you don't have all of these conditions at once you should get an "easy" challenge and it should be quick.)<p>The challenge is definitely figuring out if this solution actually works at scale or not. I've played around with an implementation of Hashcash myself, using WebCrypto, but I worry because even using WebCrypto it is <i>quite</i> a lot slower than cracking hashes in native code. But seeing Anubis seemingly have <i>some</i> success makes me hopeful. If it gains broad adoption, it might just be enough of a pain in the ass for scrapers, while still being possible for automation to pass provided they can pay the compute toll (e.g. hopefully anything that's not terribly abusive.)<p>On a lighter note, I've found the reception of Anubis, and in particular the anime-style mascot, to be predictably amusing.<p><a href="https://discourse.gnome.org/t/anime-girl-on-gnome-gitlab/27689" rel="nofollow">https://discourse.gnome.org/t/anime-girl-on-gnome-gitlab/276...</a><p>(Note: I'd personally suggest not going and replying here. Don't want to encourage brigading of any sort, just found this mildly amusing.)
Is there a way to alter text to poison AI training sets? I know there's Glaze and Nightshade for images but I've heard of nothing to poison text models. To be clear, this wouldn't be a defensive measure to stop scraping; it'd be an offensive honeypot: you'd want to make pages that have the same text but mutated slightly differently each time, so that AI scrapers preferentially load up on your statistically different text and then yield a poisoned model. Ideally the scraper companies will realize what's going on and stop scraping.
Ideas:<p>- Make it generate cryptucurrency, so that the work is not wasted. Either to compensate for server expences hosting the content, or for some noble non-profit cause - all installations would collect the currency to a single account. Wasting the work is worse than these both options.<p>- An easy way for good crawlers (like internet archive) to authenticate themselves. E.g. TLS client side authentication or simply an HTTP request header containing signature for the request (the signature in the header may be based on, for example, on their domain name and the TLS cert for that domain)
For no js solution, I think some sort of using optical illusion as captcha could works, especially <a href="https://en.wikipedia.org/wiki/Magic_Eye" rel="nofollow">https://en.wikipedia.org/wiki/Magic_Eye</a> or something like <a href="https://www.youtube.com/watch?v=Bg3RAI8uyVw" rel="nofollow">https://www.youtube.com/watch?v=Bg3RAI8uyVw</a> which could cleverly hide captcha answer within animated noise mess.<p>However these methods are not really accessibility-friendly tho.
Discussion here <a href="https://news.ycombinator.com/item?id=43422929">https://news.ycombinator.com/item?id=43422929</a>
Regarding the problem of how to let "good" bots through:<p>You could use PKI: Drop the PoW if the client provides a TLS <i>client</i> certificate chain that asserts that <publicKey> corresponds to a private key that is controlled by <fullNamesAndAddressesOfThesePeople> (or just by, say, <people WhoControlThisUrl>, for Let's Encrypt-style automatable cert signing). This would be a slight hassle for good bot operators to set up, but not a very big deal. The result is that bad bots couldn't spoof good bots to get in.<p>(Obviously this strategy generalises to handling human users too -- but in that case, the loss of privacy, as well as admin inconvenience, makes it much less palatable.)
It's a great idea, but I fear if this keeps going viral like it did in the last few days, more bot authors will be motivated to add special handling for it and e.g change the user agent to a non-Mozilla one.
The performance on mobile is kinda suck tho, took like 30 seconds to wait for PoW on difficulty 4 on Firefox Android. By that time I have to resist the urge to switch to do something else.
This reminded me of an article I printed (yes, with paper) at my college more than 20 years ago, titled Parasitic Computing. I don't remember where it was originally published but I do think I might have stumbled upon it via kuro5hin (maybe); a quick search resulted the publication from Nature (though paywalled).<p>- <a href="https://www.nature.com/articles/35091039" rel="nofollow">https://www.nature.com/articles/35091039</a>