TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Are you blocking OpenAI access to your website?

8 pointsby rrmdpover 1 year ago
I read over Twitter that a lot of founders are blocking GPTBot (or OpenAI web crawler) access to their websites or startups, are you? Why?

5 comments

sysadm1nover 1 year ago
You can block OpenAI, but you should really be blocking all bots and bad actors who may be scraping your site, scanning for vulns, or the new kid in town: using your site as training data. Robots.txt is not enough, and whilst the major players (Google, Bing etc) honor robots.txt, it can be completely ignored by other actors.
评论 #37551666 未加载
version_fiveover 1 year ago
I encourage it, I'd be happy for anything I do to make it into the repertoire of an language model. What is the downside, I'm sharing it on the internet anyway. It's people trying to protect dying business models (ads, thankfully) or who are upset someone else found a use for their data and want to retrospectively rent seek that get worried about crawling.
评论 #37549777 未加载
johnethover 1 year ago
I&#x27;d block it on all of my sites, <i>except</i> in the limited cases where it&#x27;s advantageous <i>for me</i> to let them scrape it.<p>So, blog posts, things like that: no.<p>Things like technical documentation, that users of ChatGPT might find useful, and that would benefit me if those users can access if it&#x27;s included there: sure.
Lariscusover 1 year ago
Training LLMs on data that you don&#x27;t own the rights to is copyright infringement. Why should I continue to feed a machine that already violated my rights?
评论 #37550580 未加载
评论 #37550467 未加载
KomoDover 1 year ago
No, not intentionally at least, very possible they get stopped by Cloudflare though.