TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How we blocked TikTok's Bytespider bot and cut our bandwidth by 80%

34 点作者 chptung12 个月前

6 条评论

gizmo68612 个月前
I don&#x27;t get what Bytedance is doing here. Clearly they are not actively trying to evade blocks, as they are idenifying their bot with a user agent sites can block.<p>However, surely they have enough smart engineers there to realize that running a bot at full speed (and, based on other reports, completely ignoring robots.txt) will get them blocked by a lot of sites.<p>If they just had a well behaved spider, almost no one would mind. Getting crawled is a fact of life on the internet, and most website owners recognize it as an essential cost of doing busses. Once you get a reputation as a bad spider, though, that is very hard to shake.
jd2012 个月前
I didn&#x27;t see it mentioned, but why not just use robots.txt? Does Bytespider ignore it?
评论 #40450412 未加载
评论 #40444042 未加载
chasd0012 个月前
Is returning a 403 based on the user agent worth a blog post? Also, can&#x27;t Bytespider just change their user agent to Byte-Spider? Or, just make their user agent a random string? It will be a forever arms race and require constant code updates to keep chasing that bot by user agent. You&#x27;re probably better off whitelisting the known user agents and blocking everything else.<p>Also, does it really require a specific &quot;gem&quot;? This is HTTP request filtering, the router (as in the real router, like the metal box with network cables) can probably do it by itself these days.
评论 #40444097 未加载
评论 #40444065 未加载
braden_e12 个月前
This is the worst behaved bot I have ever seen, I suspect it is AI related. I recently decided to block all the AI crawlers - unlike search engines I get nothing from them.
评论 #40444049 未加载
mmaunder12 个月前
Is it just me or is that site a bit broken? Weirdly dark.<p>Edit: Nice try on the vote brigade guys. lol
评论 #40444035 未加载
catoc12 个月前
Can large companies not be faulted for ignoring robots.txt? Seems like something GDPR could enforce for personal(ly owned) sites?
评论 #40444082 未加载
评论 #40444077 未加载