TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to prevent scraping?

1 点作者 metaprinter超过 13 年前
I've gone and built an extensive website for nursing students (that took forever to populate with data) but I'm wary of launching it until I learn how to prevent or minimize automated scraping of the content.<p>I thought about showing a teaser and requiring login to see everything, but then I lose out on google juice, no?<p>It's an LAMP environment. Any thoughts?

3 条评论

georgemcbay超过 13 年前
Any time you spend thinking about this is a waste. You can't stop scraping on the web, period. And any halfass attempt you make to try it is going to kill you on SEO as you already suspect.
jnbiche超过 13 年前
Your site will likely not be scraped unless/until it takes off. And once that happens, you'll have your foothold and no me-too site is going to surpass you unless they add more/better content. I wouldn't worry about it at this stage.
评论 #3281728 未加载
stray超过 13 年前
You can't prevent scraping, but you can poison it. I can think of two approaches:<p>1. Replace bits of text on output with unicode look-alikes. Humans will still read what you want them to read, but non-humans get crap.<p>2. The mountweasel approach: put fake entries in that humans would never find. Then you can google these fake entries - any site other than your own with Mt. Weasel, is the result of scraping your site.<p>But honestly, most of our efforts to protect "our" work is just misguided busy-work...
评论 #3281729 未加载