I've just launched No-NSFW (NSFW content warning system) which relies on user feedback to determine site ratings.<p>I'm now thinking of introducing a Bayesian filter to determine site content. Does this make sense ?<p>Also, where do I hunt for seed data - I'm using nsfw.reddit for NSFW data (thanks kirubakaran), what do i use for SFW data ?
Also have a look at DansGuardian <a href="http://dansguardian.org/" rel="nofollow">http://dansguardian.org/</a>. Blacklist files are available here: <a href="http://urlblacklist.com/" rel="nofollow">http://urlblacklist.com/</a><p>I'm not sure what you are looking for in terms of safe for work data; maybe technorati tags?