I fear for the unauthenticated web

117 pointsby SethMLarson2 months ago

18 comments

cxr2 months ago

Perversely, this submission is essentially blogspam. The article linked in the second paragraph, to which this "1 minute" read adds almost nothing of value, is the main story:<<a href="https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/" rel="nofollow">https://thelibre.news/foss-infrastructure-is-under-attack-by...</a>>394 comments. 645 points. Submitted 3 hours ago: <<a href="https://news.ycombinator.com/item?id=43422413">https://news.ycombinator.com/item?id=43422413</a>>

评论 #43425371 未加载

评论 #43425559 未加载

评论 #43425849 未加载

评论 #43425839 未加载

评论 #43425810 未加载

hugs2 months ago

I might be naive, but I think it's time we seriously start implementing "HTTP status code 402: Payment Required" across the board."L402" is an interesting proposal. Paying a fraction of a penny per request. <a href="https://github.com/l402-protocol/l402" rel="nofollow">https://github.com/l402-protocol/l402</a>

评论 #43425745 未加载

评论 #43425711 未加载

评论 #43425923 未加载

fewsats2 months ago

There's a real economic problem here: when someone scrapes your site, you're literally paying for them to use your stuff. That's messed up (and not sustainable)It seems like a good fit for micropayments. They never took off with people but machines may be better suited for them.L402 can help here.<a href="https://l402.org" rel="nofollow">https://l402.org</a>

评论 #43426031 未加载

评论 #43426056 未加载

Aurornis2 months ago

Rate limiting is the first step before cutting everything off behind forced logins.> This practice started with larger websites, ones that already had protection from malicious usage like denial-of-service and abuse in the form of services like Cloudflare or FastlyFYI Cloudflare has a very usable free tier that’s easy to set up. It’s not limited to large websites.

评论 #43425124 未加载

评论 #43424998 未加载

评论 #43425122 未加载

评论 #43425164 未加载

评论 #43426169 未加载

评论 #43425523 未加载

parliament322 months ago

Linked in the article that this article links to is a project I found interesting for combatting this problem, a (non-crypto) proof-of-work challenge for new visitors <a href="https://github.com/TecharoHQ/anubis" rel="nofollow">https://github.com/TecharoHQ/anubis</a>Looks like the GNOME Gitlab instance implements it: <a href="https://gitlab.gnome.org/GNOME" rel="nofollow">https://gitlab.gnome.org/GNOME</a>

评论 #43426066 未加载

hubraumhugo2 months ago

We should try separating good bots from bad bots:Good bots: search engine crawlers that help users find relevant information. These bots have been around since the early days of the internet and generally follow established best practices like robots.txt and rate limits. AI agents like OpenAI's Operator or Anthopic's Computer Use probably also fit into that bucket as they are offering useful automation without negative side effects.Bad bots: bots that have a negative affect website owners by causing higher costs, spam, or downtime (automated account creation, ad fraud, or DDoS). AI crawlers fit into that bucket as they disregard robots.txt and spoof user agent. They are creating a lot of headaches for developers responsible for maintaining heavily crawled sites. AI companies don't seem to care about any crawling best practices that the industry has developed over the past two decades.So the actual question is how good bots and humans can coexist on the web while we protect websites against abusive AI crawlers. It currently feels like an arms race without a winner.

评论 #43425185 未加载

kmeisthax2 months ago

> How long until scrapers start hammering Mastodon servers?Mastodon has AUTHORIZED_FETCH and DISALLOW_UNAUTHENTICATED_API_ACCESS which would at least stop these very naive scrapers from getting any data. Smarter scrapers could actually pretend to speak enough ActivityPub to scrape servers, though.

jmclnx2 months ago

I would think all you need to do is add a copyright statement of some kind.Sad things are getting to this point. Maybe I should add this to my site :)(c) Copyright (my email), if used for any form of LLM processing, you must contact me and pay 1000USD per word from my site for each use.

评论 #43425052 未加载

评论 #43425015 未加载

评论 #43425029 未加载

评论 #43425013 未加载

评论 #43425268 未加载

评论 #43424947 未加载

评论 #43425146 未加载

评论 #43425259 未加载

charcircuit2 months ago

Crawlers visiting every page on your website is not the main problem with the unauthenticated web.The amount of spam that happens when you let people freely post is a much bigger problem.

renegat0x02 months ago

To be honest I feel that web2 is overrated.Most of content, blogs could be static sites.For mastodon, forums I think user validation is ok and a good way to go.

0x1ceb00da2 months ago

Do I need to be worried about my bill if I've rented a simple EC2 instance without any fancy autoscaling stuff?

评论 #43425142 未加载

MontgomeryPy2 months ago

Could an answer here be for smaller websites to convert their sites into chatbots which could prevent AI scrapers from slurping up all their content/drive up their hosting costs?

评论 #43425813 未加载

napolux2 months ago

> I suggest everyone that uses cloud infrastructure for hosting set-up a billing limit to avoid an unexpected bill in case they're caught in the cross-hairs of a negligent company. All the abusers anonymize their usage at this point, so good luck trying to get compensated for damages.This is scary

评论 #43425224 未加载

anovikov2 months ago

Pretty soon virtually everything will be paywalled. Ironically, it will provide us with a good metric that lets us find out whether AGI has arrive or not: when it does, paywalling will stop working because AGI could derive more value from accessing things and will thus outbid us.

woah2 months ago

If you don't want someone to access your website, don't put it online

isoprophlex2 months ago

Everyone is (rightfully) outraged, but this is essentially nothing new. Asshat capitalists have been externalizing the costs of their asshat moneymaking schemes on the little guy since approximately forever.Deregulation is ultimately antithetical to our personal freedom.I just hope the spirit of the internet that I grew up with can be rescued, or reincarnated somehow...

ToucanLoucan2 months ago

Yet another entry in the long and shameful history of Silicon Valley abusing the public square for their own profit (or in this case, fantasies of profit) and the rest of us just have to learn to live with it because the justice system simply will not even try and give us recourse.Move fast and break things apparently has a bonus clause for the things you break not being your responsibility to fix.

评论 #43425020 未加载

评论 #43425047 未加载

JKCalhoun2 months ago

For some reason I am not really moved by a lot of the hand wringing I am seeing lately.It's a not a binary thing to me: LLMs are not god, but even without AGI, they have proven wildly useful to me. Calling them "shitty chat bots" doesn't sway me.Further I have always assumed that everything that I post to the web is publicly accessible to everyone/everything. We lost any battle we thought we could wage some 2+ decades ago when web crawlers started hoovering up data from our sites.

评论 #43425083 未加载

评论 #43425127 未加载

评论 #43425093 未加载

评论 #43425077 未加载

评论 #43425496 未加载

评论 #43425150 未加载