TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Are you blocking OpenAI access to your website?

8 点作者 rrmdp超过 1 年前
I read over Twitter that a lot of founders are blocking GPTBot (or OpenAI web crawler) access to their websites or startups, are you? Why?

5 条评论

sysadm1n超过 1 年前
You can block OpenAI, but you should really be blocking all bots and bad actors who may be scraping your site, scanning for vulns, or the new kid in town: using your site as training data. Robots.txt is not enough, and whilst the major players (Google, Bing etc) honor robots.txt, it can be completely ignored by other actors.
评论 #37551666 未加载
version_five超过 1 年前
I encourage it, I'd be happy for anything I do to make it into the repertoire of an language model. What is the downside, I'm sharing it on the internet anyway. It's people trying to protect dying business models (ads, thankfully) or who are upset someone else found a use for their data and want to retrospectively rent seek that get worried about crawling.
评论 #37549777 未加载
johneth超过 1 年前
I&#x27;d block it on all of my sites, <i>except</i> in the limited cases where it&#x27;s advantageous <i>for me</i> to let them scrape it.<p>So, blog posts, things like that: no.<p>Things like technical documentation, that users of ChatGPT might find useful, and that would benefit me if those users can access if it&#x27;s included there: sure.
Lariscus超过 1 年前
Training LLMs on data that you don&#x27;t own the rights to is copyright infringement. Why should I continue to feed a machine that already violated my rights?
评论 #37550580 未加载
评论 #37550467 未加载
KomoD超过 1 年前
No, not intentionally at least, very possible they get stopped by Cloudflare though.