Also, interesting Video - albeit low content density - pointing out that those bots cause humongous amounts of traffic, maxing out most vps bandwidth plans and are not currently blocked by Cloudflare.<p>For me, it looks like ByteDance is aggressively scraping the web for ML purposes.
The article is correct: Bytespider does not observe robots.txt directives.<p>I set Apache to give Bytespider 404 on anything. Not even Ahrefsbot is this poorly behaved.