科技回声

6 条评论

gizmo68612 个月前

I don't get what Bytedance is doing here. Clearly they are not actively trying to evade blocks, as they are idenifying their bot with a user agent sites can block.However, surely they have enough smart engineers there to realize that running a bot at full speed (and, based on other reports, completely ignoring robots.txt) will get them blocked by a lot of sites.If they just had a well behaved spider, almost no one would mind. Getting crawled is a fact of life on the internet, and most website owners recognize it as an essential cost of doing busses. Once you get a reputation as a bad spider, though, that is very hard to shake.

jd2012 个月前

I didn't see it mentioned, but why not just use robots.txt? Does Bytespider ignore it?

评论 #40450412 未加载

评论 #40444042 未加载

chasd0012 个月前

Is returning a 403 based on the user agent worth a blog post? Also, can't Bytespider just change their user agent to Byte-Spider? Or, just make their user agent a random string? It will be a forever arms race and require constant code updates to keep chasing that bot by user agent. You're probably better off whitelisting the known user agents and blocking everything else.Also, does it really require a specific "gem"? This is HTTP request filtering, the router (as in the real router, like the metal box with network cables) can probably do it by itself these days.

评论 #40444097 未加载

评论 #40444065 未加载

braden_e12 个月前

This is the worst behaved bot I have ever seen, I suspect it is AI related. I recently decided to block all the AI crawlers - unlike search engines I get nothing from them.

评论 #40444049 未加载

mmaunder12 个月前

Is it just me or is that site a bit broken? Weirdly dark.Edit: Nice try on the vote brigade guys. lol

评论 #40444035 未加载

catoc12 个月前

Can large companies not be faulted for ignoring robots.txt? Seems like something GDPR could enforce for personal(ly owned) sites?

评论 #40444082 未加载

评论 #40444077 未加载

6 条评论

gizmo68612 个月前

jd2012 个月前

I didn't see it mentioned, but why not just use robots.txt? Does Bytespider ignore it?

评论 #40450412 未加载

评论 #40444042 未加载

chasd0012 个月前

评论 #40444097 未加载

评论 #40444065 未加载

braden_e12 个月前

This is the worst behaved bot I have ever seen, I suspect it is AI related. I recently decided to block all the AI crawlers - unlike search engines I get nothing from them.

评论 #40444049 未加载

mmaunder12 个月前

Is it just me or is that site a bit broken? Weirdly dark.Edit: Nice try on the vote brigade guys. lol

评论 #40444035 未加载

catoc12 个月前

Can large companies not be faulted for ignoring robots.txt? Seems like something GDPR could enforce for personal(ly owned) sites?

评论 #40444082 未加载

评论 #40444077 未加载

How we blocked TikTok's Bytespider bot and cut our bandwidth by 80%

6 条评论

How we blocked TikTok's Bytespider bot and cut our bandwidth by 80%

6 条评论