My co-founder and I started a static site hosting platform. From our past experience in user-generated content, we knew we needed moderation. So, we wrote a very simple script that checks for common phishing attempts and alerts us.<p>However, this simple script does not catch some of the more advance phishing sites and it doesn't catch other types of content that we can't support on our platform like porn.<p>Curious if anyone has tips and tricks for content moderation? We're still going to be manually reviewing sites because we haven't reached a scale that makes that impossible. But automation is nice.
Follow-up question: what work has been done on client-side moderation? I know this gets dangerously close to the kind of content scanning that eg apple has tried (to very detrimental results), but hear me out: I really think this is a prerequisite to end-to-end encryption on a social network (there has to be some level of protection; even if 100% of users report 100% of bad content, imagine scrolling a feed and stumbling upon CSAM simply because you were the first person to see it). I also also think it's possible to strike a balance that preserves user agency while still protecting them, by simply inserting a manual reporting step. So, for example, potentially problematic content gets put behind an interstitial with a content warning and options to view, hide, report, etc. But again, this requires client-side content classification.<p>I'm aware of eg NSFWJS, which is a tensorflowJS model [1]. Is there anything else that, say, can also do violence/gore detection?<p>[1] <a href="https://github.com/infinitered/nsfwjs">https://github.com/infinitered/nsfwjs</a>
Is a static site hosting platform required to proactively monitor which content paying users host in your jurisdiction?<p>Wouldn't a solid set of processes to handle content complaints and knowing who your customers are in case the hosting country's law enforcement has a case suffice?<p>Or are you having some free tier where users can anonymously upload stuff?<p>In the latter case — a free place to stash megabytes — you'll need to detect password protected archives in addition to unencrypted content. Get ready for a perpetual game of whack-a-mole though.
If you care about moderation, a lot of it has to be done manually. Manual moderation requires placing a high level of trust in the moderators. That means that either you pay them well enough to care, or you build a community which user-moderators will protect. Or both.<p>That makes <i>approximately</i> all <i>business</i> ideas to host user generated content non-viable. The conflict is dynamic and you are the Maginot Line...except that any breach of laws creates a potential attack by state enforcement agencies too.<p>To put it another way, ASCII files and a teletype were enough to see pictures of naked ladies. Good luck.
Full disclosure, I work for the company that owns Cleanspeak[0].<p>We have many happy clients that moderate UGC with Cleanspeak, including gaming and customer service applications. You can read more about the approach in the docs[1] or the blog[2]. Here's a blog post[3] that talks about root word list extrapolation, which is one of the approaches.<p>Cleanspeak is not the cheapest option and there's some integration work via API required, but if you are looking for performant, scalable, flexible moderation, it's worth an eval.<p>0: <a href="https://cleanspeak.com/" rel="nofollow">https://cleanspeak.com/</a><p>1: <a href="https://cleanspeak.com/docs/3.x/tech/" rel="nofollow">https://cleanspeak.com/docs/3.x/tech/</a><p>2: <a href="https://webflow.cleanspeak.com/blog" rel="nofollow">https://webflow.cleanspeak.com/blog</a><p>3: <a href="https://cleanspeak.com/blog/root-word-list-advantage-content-moderation-via-intelligent-filtering" rel="nofollow">https://cleanspeak.com/blog/root-word-list-advantage-content...</a>
How serendipitous! I did an Ask HN last week [1] trying to get platform creators to talk about this without much success. In any case, I've built a solution for links and emails, with an API [2], in case that helps. No subscription and I'm happy to provide some credits for free, for you to test it. Reach out if you're interested!<p>[1]: <a href="https://news.ycombinator.com/item?id=42780265">https://news.ycombinator.com/item?id=42780265</a>
[2]: <a href="https://oxcheck.com/safe-api" rel="nofollow">https://oxcheck.com/safe-api</a>
I don’t have an answer for everything you are looking for, but I wrote bonk[1] to solve a similar issue, as I needed to ensure users weren’t uploading porn[2]. Maybe you can find an use for it too.<p>[1]: <a href="https://git.sr.ht/~jamesponddotco/bonk" rel="nofollow">https://git.sr.ht/~jamesponddotco/bonk</a><p>[2]: Because I host in Germany.
Gumroad open-sourced Iffy yesterday, and you can check it out: <a href="https://github.com/anti-work/iffy">https://github.com/anti-work/iffy</a>
I hate to be that guy. But this seems like the perfect use case for an LLM. First put content through your script and then through a decently prompted LLM. Anything it catches, put in a queue for manual review.
Other comments already mentioned multiple services (from OpenAI to Cleanspeak). I want to provide a high level clarification from experience.<p>Moderation is a vast topic - there are different services that focus on different areas: such as, text, images, CSAM, etc. Traditionally you treat each problem area differently.<p>Within each area, you, as an operator, need to define the level of sensitivity for the category of offense (policies).<p>Some policies seem more clear cut (eg image: porn) while others seem more difficult to define precisely (eg text: bullying or child grooming).<p>In my experience, text moderation is more complex and presents a lot of risks.<p>There are different approaches for text moderation.<p>Keyword based matching services like Cleanspeak, TwoHat, etc. are baseline level useful but limiting because assessing a keyword requires context. A word can be miscategorized and results in false positive or false negative with this approach, which may impact your operation at scale; or UX if a platform requires more of a real-time experience.<p>LLM is theoretically well suited for taking context into account for text moderation; however they are also pricier and may require furthering fine tuning or self-hosting for cost savings.<p>CSAM as a problem area presents the highest risks though may be more clear cut. There are dedicated image services and regulatory bodies that focus on this area (for automating reporting to local law enforcement).<p>Finally, EU (DSA) also requires social media companies adhere to self report on moderation actions. EU also requires companies to provide pathways for users to own and delete their data (GDPR).<p>Edit: FIXED typos; ADDED a note on CSAM and DSA & GDPR