Hello everyone,<p>A friend is working on one of the largest musical historical archives as an archivist. His team has been tasked with going through the personal archive of a donor.<p>The archive contains hundreds of thousands of pictures which are downloaded with random filenames from standard browsing from that donor.<p>The issue is that the archive includes useful pictures mixed together with adult pictures with no way to distinguish between them besides manual review. They have been working on this for weeks and they have a ton more pictures to go through.<p>Is there any software I can help them put together to make the distinguish automatically with a good accuracy? Even sorting (possibly adult/ possibly safe) is good enough. I am SWE myself, so I can create something fast in many langugaes (especially Python and JS).<p>The archive pertains to one of the greatest opera singers of the 20th century and is the largest to date, so your help here will be meaningful.<p>Thanks a lot!
No experience with any of this, but a quick search turns up stuff like <a href="https://deepai.org/machine-learning-model/nsfw-detector" rel="nofollow">https://deepai.org/machine-learning-model/nsfw-detector</a>, which looks affordable and straightforward to implement.<p>And here's bunch more: <a href="https://rapidapi.com/collection/nudity-detection-image-moderation-api" rel="nofollow">https://rapidapi.com/collection/nudity-detection-image-moder...</a>
You can use CLIP to build a database to search (offline) for keywords like “nude” or “naked”. Specifically, I use clip-anytorch with the ViT-B/16 pertrained model and find the result very good. Just go to pypi and the corespondent GitHub. They have examples and demo for a quick start. It can run on CPU, too, albeit a bit slow.