TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Cost of Being Crawled: LLM Bots and Vercel Image API Pricing

112 点作者 navs大约 1 个月前

20 条评论

greatgib大约 1 个月前
A single $5 vps should be able to handle easily tens of thousands of requests...<p>Not that much for simple thumbnails in addition. So sad that the trend of &quot;fullstack&quot; engineers being just frontend js&#x2F;ts devs took off with thousands of companies having no clue at all about how to serve websites, backends and server engineering...
评论 #43689466 未加载
评论 #43699030 未加载
评论 #43689225 未加载
leerob大约 1 个月前
(I work at Vercel) While it&#x27;s good our spend limits worked, it clearly was not obvious how to block or challenge AI crawlers¹ from our firewall (which it seems you manually found). We&#x27;ll surface this better in the UI, and also have more bot protection features coming soon. Also glad our improved image optimization pricing² would have helped. Open to other feedback as well, thanks for sharing.<p>¹: <a href="https:&#x2F;&#x2F;vercel.com&#x2F;templates&#x2F;vercel-firewall&#x2F;block-ai-bots-firewall-rule" rel="nofollow">https:&#x2F;&#x2F;vercel.com&#x2F;templates&#x2F;vercel-firewall&#x2F;block-ai-bots-f...</a><p>²: <a href="https:&#x2F;&#x2F;vercel.com&#x2F;changelog&#x2F;faster-transformations-and-reduced-pricing-for-image-optimization" rel="nofollow">https:&#x2F;&#x2F;vercel.com&#x2F;changelog&#x2F;faster-transformations-and-redu...</a>
评论 #43689974 未加载
评论 #43687788 未加载
评论 #43688299 未加载
bhouston大约 1 个月前
The issue is Vercel Image API is ridiculously expensive and also not efficient.<p>I would recommend using Thumbor instead: <a href="https:&#x2F;&#x2F;thumbor.readthedocs.io&#x2F;en&#x2F;latest&#x2F;" rel="nofollow">https:&#x2F;&#x2F;thumbor.readthedocs.io&#x2F;en&#x2F;latest&#x2F;</a>. You could have ChatGPT write up a React image wrapper pretty quickly for this.
评论 #43692422 未加载
评论 #43688470 未加载
gngoo大约 1 个月前
I once sat down to calculate the costs of my app if it ever went viral being hosted at vercel. That has put me off on hosting anything on vercel ever or even touching NextJS. It feels like total vendor lock in once you have something running there, and you&#x27;re kind of end up paying them 10x more than if you had taken the extra time to deploy it yourself.
评论 #43689510 未加载
评论 #43689397 未加载
jhgg大约 1 个月前
$5 to resize 1,000 images is ridiculously expensive.<p>At my last job we resized a very large amount of images every day, and did so for significantly cheaper (a fraction of a cent for a thousand images).<p>Am I missing something here?
评论 #43687883 未加载
评论 #43687947 未加载
评论 #43688894 未加载
评论 #43687896 未加载
评论 #43688553 未加载
评论 #43688116 未加载
ashishb大约 1 个月前
As someone who maintains a Music+Podcast app as a hobby project, I intentionally have no servers for it.<p>You don&#x27;t need one. You can fetch RSS feeds directly on mobile devices; it is faster, less work to maintain, and has a smaller attach surface for rouge bots.
评论 #43688505 未加载
VladVladikoff大约 1 个月前
Death by stupid micro services. Even at 1.5 mil pages, and the traffic they are talking about this could easily be hosted on a a fixed $80&#x2F;month linode.
评论 #43688125 未加载
ramesh31大约 1 个月前
The cost of getting locked into Vercel.
nullorempty大约 1 个月前
Yeah, AI crawlers - add that to my list of phobias. Though for a bootstrapped startup why not look to cut all recurrent expenses and just deploy imagemagik that I am sure will do the trick for less.
GodelNumbering大约 1 个月前
Wow this is interesting. I launched my site like a week ago, only submitted to google. But all the crawlers (especially the SEO bots) mentioned in the article were heavily crawling it in a few days.<p>Interestingly, openai crawler visited over a 1000 times, many of them for &quot;ChatGPT-User&#x2F;1.0&quot; which is supposed to be for when a user searches chatgpt. Not a single referred visitor though. Makes me wonder if it&#x27;s any beneficial to the content publishers to allow bot crawls<p>I ended up banning every SEO bot in robots.txt and a bunch of other bots
评论 #43687831 未加载
outloudvi大约 1 个月前
Vercel has a fairly generous free quota and a non-negligible high pricing scheme - I think people still remember <a href="https:&#x2F;&#x2F;service-markup.vercel.app&#x2F;" rel="nofollow">https:&#x2F;&#x2F;service-markup.vercel.app&#x2F;</a> .<p>For the crawl problem, I want to wait and see whether robots.txt is proved enough to stop GenAI bots from crawling since I confidently believe these GenAI companies are too &quot;well-behaved&quot; to respect robots.txt.
评论 #43689400 未加载
评论 #43688225 未加载
randunel大约 1 个月前
&gt; Optimizing an image meant that Next.js downloaded the image from one of those hosts to Vercel first, optimized it, then served to the users.<p>So Metacast generate bot traffic on other websites, presumably to &quot;borrow&quot; their content and serve it to their own users, but they don&#x27;t like it when others do the same to them.
评论 #43689935 未加载
sergiotapia大约 1 个月前
Another story for <a href="https:&#x2F;&#x2F;serverlesshorrors.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;serverlesshorrors.com&#x2F;</a><p>It&#x27;s crazy how these companies are really fleecing their customers who don&#x27;t know any better. Is there even a way to tell Vercel: &quot;I only want to spend $10 a month max on this project, CUT ME OFF if I go past it.&quot;? This is crazy.<p>I spend $12 a month on BunnyCDN. And $9 a month on BunnyCDN&#x27;s image optimizer that allows me to add HTTP params to the url to modify images.<p>1.33TB of CDN traffic. (ps: can&#x27;t say enough good things about bunnycdn, such a cool company, does exactly what you pay for nothing more nothing less)<p>This is nuts dude
评论 #43687719 未加载
评论 #43687758 未加载
评论 #43687837 未加载
mediumsmart大约 1 个月前
Don’t feed the bots. Why a pixel image? Take an svg and make it pulse while playing.
CharlieDigital大约 1 个月前
Is there no CDN? This feels like it&#x27;s a non-issue if there&#x27;s a CDN.
评论 #43690050 未加载
dylan604大约 1 个月前
I guess it goes to show how jaded I am, but as I was reading this, it felt like an ad for Vercel. I&#x27;m so sick of marketing content being submitted as actual content, that when I read a potentially actual blog&#x2F;post-mortem, my spidey senses get all tingly about potential advertising. However, I feel like if I turn down the sensitivity knob, I&#x27;ll be worse off than knee jerk thinking things like this are ads.
评论 #43690059 未加载
bitbasher大约 1 个月前
$5 for 1,000 image optimizations? Is Vercel not caching the optimization? Why would it be doing more than one per-image on a fresh deploy?
cratermoon大约 1 个月前
&quot;Step 3: robots.txt&quot;<p>Will do nothing to mitigate the problem. As is well known, these bots don&#x27;t respect it.
评论 #43689594 未加载
andrethegiant大约 1 个月前
It’s a shame that the knee-jerk reaction has been to outright block these bots. I think in the future, websites will learn to serve pure markdown to these bots instead of blocking. That way, websites prevent bandwidth overages like in the article, while still informing LLMs about the services their website provides.<p>[disclaimer: I run <a href="https:&#x2F;&#x2F;pure.md" rel="nofollow">https:&#x2F;&#x2F;pure.md</a>, which helps websites shield from this traffic]
评论 #43687773 未加载
评论 #43687622 未加载
评论 #43687893 未加载
评论 #43689668 未加载
评论 #43687866 未加载
评论 #43687648 未加载
评论 #43687810 未加载
评论 #43687685 未加载
评论 #43687824 未加载
cachedthing0大约 1 个月前
&quot;Together they sent 66.5k requests to our site within a single day.&quot;<p>Only scriptkiddies are getting into problems by such low numbers. Im sure security is your next &#x27;misconfiguration&#x27;. Better search an offline job in the entertainment industries.
评论 #43687985 未加载