I've gone and built an extensive website for nursing students (that took forever to populate with data) but I'm wary of launching it until I learn how to prevent or minimize automated scraping of the content.<p>I thought about showing a teaser and requiring login to see everything, but then I lose out on google juice, no?<p>It's an LAMP environment. Any thoughts?
Any time you spend thinking about this is a waste. You can't stop scraping on the web, period. And any halfass attempt you make to try it is going to kill you on SEO as you already suspect.
Your site will likely not be scraped unless/until it takes off. And once that happens, you'll have your foothold and no me-too site is going to surpass you unless they add more/better content. I wouldn't worry about it at this stage.
You can't prevent scraping, but you can poison it. I can think of two approaches:<p>1. Replace bits of text on output with unicode look-alikes. Humans will still read what you want them to read, but non-humans get crap.<p>2. The mountweasel approach: put fake entries in that humans would never find. Then you can google these fake entries - any site other than your own with Mt. Weasel, is the result of scraping your site.<p>But honestly, most of our efforts to protect "our" work is just misguided busy-work...