When I was in the advertising business, one of the core products I created was a brandsafety product - basically preventing advertisers from advertising on dodgy sites.<p>I messed around with algorithms that detected nudity (because big brand advertisers don't want their ads showing up on porn sites). One of the more interesting and simple-to-use one is actually a simple averaging of the images across multiple samples. That one was easy to implement and has relatively good results.<p>In the end though, I ended up not using it because text clustering algorithms worked better in classifying content.
"The training set for the skin filter consisted of 1,182,608 manually labeled skin pixels and 10,471,553 manually labeled non-skin pixels while the testing set consisted of 2,303,824 manually labeled skin pixels and 24,285,952 manually labeled non-skin pixels."<p>That's a lot of pixels to manually label.
This could only be a very rough first-pass on detection. Bathing suits can be very skimpy without being fully nude.<p>And social context plays a large role, for instance distinguishing between a fat male's nipples and a small-chested female's nipples would be impossible without analyzing a lot more than skin color.<p><a href="http://i.imgur.com/sb6Iw.jpg" rel="nofollow">http://i.imgur.com/sb6Iw.jpg</a>
Seems like a not very scalable approach to the problem. I would think if you wanted to capture <i>all</i> nudity (including monochromatic or illustrated), you would instead go at the problem from the angle of <i>titillation</i>. You could even round up images that are not necessarily human based (fruit arranged provocatively, for instance).
direct link: <a href="http://onebit.us/x/i/814381733331796005.pdf" rel="nofollow">http://onebit.us/x/i/814381733331796005.pdf</a>
how do these nudity detection API work? Is there a crowdsourcing going underneath the hood? Are they using some clustering algorithm to detect a range of skin color (if 90%), it's nude.