TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to prevent scraping?

1 pointsby metaprinterover 13 years ago
I've gone and built an extensive website for nursing students (that took forever to populate with data) but I'm wary of launching it until I learn how to prevent or minimize automated scraping of the content.<p>I thought about showing a teaser and requiring login to see everything, but then I lose out on google juice, no?<p>It's an LAMP environment. Any thoughts?

3 comments

georgemcbayover 13 years ago
Any time you spend thinking about this is a waste. You can't stop scraping on the web, period. And any halfass attempt you make to try it is going to kill you on SEO as you already suspect.
jnbicheover 13 years ago
Your site will likely not be scraped unless/until it takes off. And once that happens, you'll have your foothold and no me-too site is going to surpass you unless they add more/better content. I wouldn't worry about it at this stage.
评论 #3281728 未加载
strayover 13 years ago
You can't prevent scraping, but you can poison it. I can think of two approaches:<p>1. Replace bits of text on output with unicode look-alikes. Humans will still read what you want them to read, but non-humans get crap.<p>2. The mountweasel approach: put fake entries in that humans would never find. Then you can google these fake entries - any site other than your own with Mt. Weasel, is the result of scraping your site.<p>But honestly, most of our efforts to protect "our" work is just misguided busy-work...
评论 #3281729 未加载