I run a niche social news site in the part-time, and it has a very small community. It's growing slowly, but I'm fine with it. What I'm worried about is - how to control spammers by not submitting irrelevant stories on my site.<p>Adding CAPTCHA is one option, but then I was wondering, YC News also doesn't have any CAPTCHA protection. Then how come spammers don't submit advertisement based non-relevant news to YC News?<p>Does YC News algorithm detects such kind of links? Or is there any manual intervention? Or it's just that the community is so good that nobody attacks it.<p>In anyway, your input about how can tackle this situation will be very helpful. Currently I manually go and delete all those irrelevant submissions (Daily there are atleast 5-10 such submissions.)<p>-Aditya
They are. We currently get about 30-40 spam submissions a day. Turn on showdead in your profile and you'll see it all. The reason we don't get more is that we're very aggressive about killing spams. Most spammers give up eventually when they realize that submitting here generates near zero traffic.
Pretty much every site with user generated content is overwhelmed with people trying to post spam. For my site, I use a combination of javascript human detection, bayesian filtering, and aggressive human intervention (including single click "spam this" links on every piece of content when logged in as an Admin)<p>It's worth noting that since late 2007, a significant portion of comment spam is human powered. CAPTCHA style bot filtering doesn't work against it, since it's not bots doing the posting. Bayesian filtering and good moderation tools are essential these days.
The sites most vulnerable to spam are ones that a) have a critical mass of readership, especially dumb readership that will click on ridiculous spam links, and b) ones not run by people who are active contributors to the field of spam filtering.
I think a lot of it also has to do with the community itself. A site of this size would probably get 300-400 spam messages a day if it weren't for the fact that it's audience would see right through it. Tech people are so concious of Spam that they ignore it out of principle which means spamming a tech site pointless.<p>As for suggestions...<p>1. Obviously CAPTCHA. It just makes sense
2. I find keyword blocking very effective. So, for example, if I was running Hacker News I'd block any news item containing the word Viagra that was submitted by a user that is under a certain feedback level (like, no feedback, for example). With one caveat which is to give them a way to manually verify it (say an e-mail sent to them that allows them to verify they are an actual person and have the item approved)
3. Use E-Mail Spam Block Lists. Lists like SBL, CBL and XBL give IP addresses that generate massive amounts of spam. Many of those same IP addresses generate web spam.
4. I've never been a fan of this paticular method because I think it's discriminatory to an extent I'm uncomfortable with but many places have special requirements for countries that are famous for spam generation (Russia, China, etc...) Like making users from those IPs jump through special registration hoops.<p>Hope it Helps!
We have a problem with comment spam on our site (a news and prediction market site, using Drupal). We introduced captchas, activated nofollow, all to no avail -- there are some very persistent spammers who will still go through the trouble of entering captchas just to have their stupid links show up at the bottoms of comment threads. It's not a huge issue, but it's definitely an irritant and added cost, in terms of the staff time required to clear it out.