TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

What is the best way to programatically detect porn images? (2009)

142 pointsby romain_gover 11 years ago

36 comments

nailerover 11 years ago
I used pifilter (now WeSEE:Filter , <a href="http://corp.wesee.com/products/filter/" rel="nofollow">http:&#x2F;&#x2F;corp.wesee.com&#x2F;products&#x2F;filter&#x2F;</a>) for a production, realtime, anonymous confession site (imeveryone.com) in 2010.<p>It cost, IIRC, a tenth of a cent per image URL. Rather than being based on skin tone, it was created based on algos to specifically identify labia, anuses, penises, etc. REST API: send a URL, get back a yes&#x2F;no&#x2F;maybe. You decided what to do with the maybes.<p>My experience:<p>- Before launch, I tested it with 4chan b as a feed, and was able to produce a mostly clean version of b with the exception of cartoon imagery.<p>- It could catch most of the stuff people tried to post to the site. Small breasted women (being that breasts are considered &#x27;adult&#x27; in the US) was the only thing that would get through and wasn&#x27;t a huge concern. Completely unmaintained public hair (as revealing as a black bikini) would also get through.<p>- Since people didn&#x27;t know what I was testing with they didn&#x27;t work around it (so nobody tried posting drawings or cartoons), but I imagine eg a photo of a prolapse might not trigger the anus detection as the shape would be too different.<p>- pifilter erred on the side of false negative, but one notable false positive: a pastrami sandwich.
评论 #6560129 未加载
评论 #6562212 未加载
评论 #6562199 未加载
Theodoresover 11 years ago
In The Olden Pre-Digital Days porn was either in print or on a television screen. Back then (we are talking two whole decades ago) experienced broadcast engineers could instantly spot porn just by catching a look at an oscilloscope (of which there were usually many in a machine room).<p>Notionally the oscilloscope would be there to show that the luminance and chroma was okay in the signal (i.e. it could be broadcast over the airwaves to look as intended at the other end - PAL&#x2F;NTSC), however, porn and anything likely to be porn had a distinctive pattern on the oscilloscope screen. Should porn be suspected then the source material would obviously be patched through to a monitor &#x27;just in case&#x27;.<p>Note that the oscilloscope was analog and that the image would be changing 25&#x2F;30 times a second. Also, back then there were not so many false positives on broadcast TV, e.g. pop videos etc. where today&#x27;s audience deems them artful rather than porn.<p>If I had to solve the problem programatically I would find a retired broadcast engineer and start from there, with what can be learned from a &#x27;scope.
评论 #6562395 未加载
adorableover 11 years ago
I have developed an algorithm to detect such images, based on several articles published by research teams all over the world (it&#x27;s incredible to see how many teams have tried to solve this problem!).<p>I found out that no single technique works great. If you want an efficient algorithm, you probably have to blend different ideas and compute a &quot;nudity score&quot; for each image. That&#x27;s at least what I do.<p>I&#x27;d be happy to discuss how it works. Here are a few techniques used:<p>- color recognition (as discussed in other comments)<p>- haar-wavelets to detect specific shapes (that&#x27;s what Facebook and others use to detect faces for example)<p>- texture recognition (skin and wood may have the same colors but not the same texture)<p>- shape&#x2F;contour recognition (machine learning of course)<p>- matching with a growing database of NSFW images<p>The algorithm is open for test here: <a href="http://sightengine.com" rel="nofollow">http:&#x2F;&#x2F;sightengine.com</a> It works OK right now but once version 2 is out it should really be great.
asoloveover 11 years ago
Amazon Mechanical Turk has an adult-content marker specifically for this purpose. Lots of people have done the paperwork to qualify for adult-content jobs and the cost of having humans do it at scale is very low: <a href="https://requester.mturk.com/help/faq#can_explicit_offensive" rel="nofollow">https:&#x2F;&#x2F;requester.mturk.com&#x2F;help&#x2F;faq#can_explicit_offensive</a><p>Source: I helped implement a MT job to filter adult content for a large hosting company.
评论 #6559427 未加载
ma2rtenover 11 years ago
I did this for my bachelor thesis for a company that shall remain unnamed. I am pretty confident that my approach works better than any of the answer posted on stackoverflow.<p>I used the so called Bag of Visual Words approach. At that time the state of the art in image recognition (now it&#x27;s neural networks). You can read about on Wikipeida. The only main change from the standard approach (SHIFT + k-means + histograms + SVM + chi2 kernel) was that I used a version of SHIFT that uses color features. In addition to this I used a second machine learning classifier based on the context of the picture. Who posted it? Is it a new user? What are the words in the title? How many view does the picture have....<p>In combination the two classifiers worked nearly flawless.<p>Shortly after that, chat roulette has having it&#x27;s porn problem and it was in the media that the founder was working on a porn filter. I send an email to offer my help, but didn&#x27;t get an reaction.
评论 #6561130 未加载
评论 #6560357 未加载
评论 #6568308 未加载
VLMover 11 years ago
This is probably going to get downvoted, but if lots of people are not overzealous puritans and want some skin, the best overall system design that maximizes happiness and profit is probably sharding into<p>puritanweirdos.example.com with no skin showing between toes and top of turtleneck (edited to add no pokies either)<p>and<p>normalpeople.example.com with 99% of the human race<p>The best solution to a problem involving computers is sometimes computer related, but sometimes is social. The puritans are never going to get along with the normal people anyway, so its not like sharding them is going to hurt.<p>Another way to hack the system is not to hire or accept holier than thou puritans. Personality doesn&#x27;t mesh with the team, doesn&#x27;t fit culture, etc. You have to draw the line somewhere, and weirdos on either end should get cut, so no CP or animals at one extreme, and no holy rollers on the other extreme.<p>The final social hack is its kind of like dealing with bullies via appeasement. So they&#x27;re blocking reasonable stuff today, tomorrow they want to block all women not wearing burkhas or depictions of women damaging their ovaries by driving. Appeasing bullies never really works in the long run, so why bother starting. &quot;If you claim not to like it, or at least enjoy telling everyone else repeatedly how you claim not to like it, stop looking at it so much, case closed&quot;
评论 #6559678 未加载
评论 #6559551 未加载
评论 #6559968 未加载
_mulder_over 11 years ago
Here&#x27;s an idea...<p>Develop a bot to trawl NSFW sites and hash each image (combined with the &#x27;skin detecting&#x27; algorithms detailed previously). Then compare the user uploaded image hash with those in the NSFW database.<p>This technique relies on the assumption that NSFW images that are spammed onto social media sites will use images that already exist on NSFW sites (or are very similar to). Then it simply becomes a case of pattern recognition, much like SoundHound for audio, or Google Image search.<p>It wouldn&#x27;t reliably detect &#x27;original&#x27; NSFW material, but given enough cock shots as source material, it could probably find a common pattern over time.<p>edit: I&#x27;ve just noticed rfusca in the OP suggests a similar method
评论 #6560242 未加载
评论 #6560066 未加载
评论 #6561014 未加载
mixmaxover 11 years ago
detecting all porn seems to be an almost impossible problem. Many kinds of advanced porn (BDSM, etc.) don&#x27;t have much skin - often the actors are in latex, tied up, or whatever. It&#x27;s obviously porn when you see it, but detecting it seems incredibly hard.<p>Detecting smurf-porn(1) (yes that&#x27;s a thing...) is even harder since all the actors are blue.<p><a href="http://pinporngifs.blogspot.dk/2012/09/smurfs-porn.html?zx=7ac31f5871d5a788" rel="nofollow">http:&#x2F;&#x2F;pinporngifs.blogspot.dk&#x2F;2012&#x2F;09&#x2F;smurfs-porn.html?zx=7...</a> - obviously very NSFW, but quite funny.
评论 #6560096 未加载
评论 #6562141 未加载
eksithover 11 years ago
To this day, I believe the best method for picking out these images is a human censor (with appropriate, company provided, counseling afterward).<p>Edit: No shortage of stock image reviewer jobs <a href="https://google.com/search?hl=en&amp;q=%22image%20reviewer%22" rel="nofollow">https:&#x2F;&#x2F;google.com&#x2F;search?hl=en&amp;q=%22image%20reviewer%22</a><p>I&#x27;m trying to find an interview of one of these people describing what it&#x27;s like on the other end. It wasn&#x27;t a pleasant story. These folks are employed by the likes of Facebook, Photobucket etc... Most are outsourced, obviously, and they all have very high turnover.
评论 #6559423 未加载
VLMover 11 years ago
Nobody has discussed i18n and l10n issues? What passes for pr0n in SF is a bit different than tx.us and thats different from .eu and from .sa (sa is saudi arabia not south africa, although they&#x27;ve probably got some interesting cultural norms too)<p>If you&#x27;re trying for &quot;must not offend any human being on the planet&quot; then you&#x27;ve got an AI problem that exceeds even my own human intelligence problem to figure out. Especially when it extends past pr0n and into stuff like satire, is that just some dudes weird self portrait, or a satire of the prophet, and are you qualified to figure it out?
betterunixover 11 years ago
How about a picture of a woman&#x27;s breasts? What about an erect penis? Sounds like porn, but you might also see these things in the context of health-related pictures or some other educational material.<p>The classic problem of trying to filter pornography is trying to separate it from information about human bodies. I suspect that doing this with images will be even harder than doing it with text.
评论 #6562084 未加载
quartertoover 11 years ago
Google reverse image search can come up with a search likely to return the given image. Perhaps this can be used for porn classification.
nathanbover 11 years ago
Seems like we were having this same problem with email spam, and Bayesian-based learning filters revolutionized the spam filtering landscape. Has anyone tried throwing computer learning at this problem?<p>We as humans can readily classify images into three vague categories: clean, questionable, and pornographic. The problem of classification is not only one of determining which bucket an image falls into but also one of determining where the boundaries between buckets are. Is a topless woman pornographic? A topless man? A painting of a topless woman created centuries ago by a well-recognized artist? A painting of a topless woman done yesterday by a relatively unknown artist? An infant being bathed? A woman breastfeeding her baby? Reasonable people may disagree on which bucket these examples fall in.<p>So what if I create three filter sets: restrictive, moderate, and permissive, and then categorize 1,000 sample images as one of those three categories for each filter set (restrictive could be equal to moderate but filter questionable images as well as pornographic ones).<p>Assuming that the learning algorithm was programmed to look at a sufficiently large number of image attributes, this approach should easily be capable of creating the most robust (and learning!) filter to date.<p>Has anyone done this?
评论 #6560088 未加载
Houshalterover 11 years ago
Everyone is focusing on the machine vision problem but the OP had a good idea:<p>&gt;There are already a few image based search engines as well as face recognition stuff available so I am assuming it wouldn&#x27;t be rocket science and it could be done.<p>Just do a reverse image search for the image, see if it comes up on any porn sites or is associated with porn words.
lectrickover 11 years ago
Relevant:<p><a href="http://en.wikipedia.org/wiki/I_know_it_when_I_see_it" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;I_know_it_when_I_see_it</a><p>Basically, it&#x27;s impossible to completely accurately identify pornography without a human actor in the mix, due to the subjectivity... and especially considering that not all nudity is pornographic.
primaryobjectsover 11 years ago
This is a classic categorical problem for machine learning. I&#x27;m surprised so many suggestions have involved formulating some sort of clever algorithm like skin detection, colors, etc. You could certainly use one of those for a baseline, but I&#x27;d bet machine learning would out-score most human-derived algorithms.<p>Take a look at the scores for classifying dogs vs cats with 97% accuracy <a href="http://www.kaggle.com/c/dogs-vs-cats/leaderboard" rel="nofollow">http:&#x2F;&#x2F;www.kaggle.com&#x2F;c&#x2F;dogs-vs-cats&#x2F;leaderboard</a>. You could use a technique of digitizing the image pixels and feeding to a learning algorithm, similar to <a href="http://www.primaryobjects.com/CMS/Article154.aspx" rel="nofollow">http:&#x2F;&#x2F;www.primaryobjects.com&#x2F;CMS&#x2F;Article154.aspx</a>.
denzil_correaover 11 years ago
I am aware of some nice scholarly work in this space. You may find Shih et al. approach of particular interest [0]. Their approach is very straight forward and based on image retrieval. They have also reported an accuracy of 99.54% for Adult image detection in their dataset.<p>[0] Shih, J. L., Lee, C. H., &amp; Yang, C. S. (2007). An adult image identification system employing image retrieval technique. Pattern Recognition Letters, 28(16), 2367-2374. Chicago<p><a href="http://sjl.csie.chu.edu.tw/sjl/albums/userpics/10001/An_adult_image_0identification_system_employing_image_retrieval_technique.pdf" rel="nofollow">http:&#x2F;&#x2F;sjl.csie.chu.edu.tw&#x2F;sjl&#x2F;albums&#x2F;userpics&#x2F;10001&#x2F;An_adul...</a>
jmngomesover 11 years ago
I came across nude.js (<a href="http://www.patrick-wied.at/static/nudejs/" rel="nofollow">http:&#x2F;&#x2F;www.patrick-wied.at&#x2F;static&#x2F;nudejs&#x2F;</a>) when researching for a social network project, seems quite nice and is Javascript based.
racbartover 11 years ago
Wouldn&#x27;t testing for skin colors produce far too many false positives to be useful? All these beach photos, fashion lingerie photos, even close portraits. And how about half of music stars these days who seem to try to never get caught more clothed than half naked?<p>Nudity != porn and certainly half-nudity != porn.<p>I&#x27;d rather go for pattern recognition. There&#x27;s lot of image recognition software these days that can distinguish the Eiffel Tower from the Statue of Liberty and it might be useful to detect certain body parts and certain body configurations (for these shots that don&#x27;t contain any private body part but there are two bodies in an unambiguous configuration).
评论 #6559647 未加载
评论 #6561218 未加载
hugofirthover 11 years ago
Whilst I agree that programmatically eliminating porn images is a very hard problem. Programmatically filtering porn websites might be easier, beyond just a simple key word search and whitelist.<p>If you assume that porn tends to cluster, rather than exist in isolation, then a crawl of other images on the source pages , applying computer vision techniques, should allow you to block pages that score above a threshold number of positive results (thus accounting for inaccuracy and false positives).
评论 #6559539 未加载
ismaelcover 11 years ago
You can use APIs like these to do nude detection - <a href="https://www.mashape.com/search?query=nude" rel="nofollow">https:&#x2F;&#x2F;www.mashape.com&#x2F;search?query=nude</a>
unotiover 11 years ago
If you&#x27;re interested in Machine Learning, the outstanding Coursera course on machine learning just started a couple of days ago. It covers a variety of machine learning topics, including image recognition. The first assignment isn&#x27;t due for a couple of weeks, so it&#x27;s a perfect time to jump in and take the machine learning course!<p><a href="https://www.coursera.org/course/ml" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;course&#x2F;ml</a>
beatover 11 years ago
Algorithmic solutions will always be hard. &quot;I know it when I see it&quot; is hard to program.<p>Depending on the site, I&#x27;d go to a trust-based solution. New users get their images approved by a human censor (pr0n == spambot in most cases). Established users can add images without approval.<p>If you&#x27;re going to try software, try something that errs on the side of caution, and send everything to a human for final decision-making, just like spam filters.
npattenover 11 years ago
&quot;You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It&#x27;s win-win.&quot;<p>hilarious!
hcarvalhoalvesover 11 years ago
Pornography is so creative that I find it hard to have one algorithm that can detect it all. Looking for features certainly wouldn&#x27;t catch the more weird stuff.<p>Maybe a good approach is an image lookup, trying to find the image on the web and seeing if it appears on a porn site, or a pornographic context.
jcfialaover 11 years ago
It seems to me that if you could somehow solicit comments on the picture, you then could do text analysis on the comments to see if someone thought they were porn or not. (Well, I&#x27;m being a little silly, but there&#x27;s a germ of an idea there.)
nate510over 11 years ago
A corollary of Rule 34 is that an algorithm to classify porn is NP-Hard.<p>Um, so to speak.
singlowover 11 years ago
So, who&#x27;s going to write the ROT13 algorithm for images. Just call it ROT128 and rotate the color value of the bits and use a ROT128 image viewer to view the original image.
评论 #6560232 未加载
wehadfunover 11 years ago
Probably the easiest way is with motion and sound. Checking for skin would be hard depending on the type of content as mixmax pointed out
djentover 11 years ago
It wouldn&#x27;t solve the entire problem, but you could look for the watermarks that major porn networks stamp on their images.
dschiptsovover 11 years ago
by filename ,)
bedheadover 11 years ago
Maybe we can channel Potter Stweart into an algorithm somehow?
评论 #6561910 未加载
level09over 11 years ago
would any one be interested in purchasing an API subscription for this kind of service ? IMO, a pipeline of AI filters can be efficient to some extent.
评论 #6561148 未加载
digitalsushiover 11 years ago
Invent an algorithm that can calculate humanity&#x27;s creative thoughts.
bicknergsengover 11 years ago
Use CrowdFlower.
bachbackover 11 years ago
machine learning? because you also want to filter cats.
评论 #6559677 未加载