A friend of mine told me about how they once had to dig through their code to figure out why their site was classified as adult by some filter. After days of searching, they found this comment at the bottom of a javascript file:<p>// Slut.<p>Which is Danish for "the end".
I don't find this very useful. It's <i>too</i> naïve for a real-world usecase.<p>I didn't look at the implementation, but the "classy party" looks like it simply matches for a sequence of 'a', 's', and 's' bytes in a string.<p>It would be better it it tokenized the sentence using punctuation and white-space as terminators. So, it would detect `big-ass sandwich` and `smart-ass person` but not `classy party` or `bass instrument`.<p>Furthermore, it would be cool if you created a configuration format for this kind of thing, so one could do something like this (excuse the config format, I realise it's probably shit and problematic):<p><pre><code> [smart][big][fat]ass
!sex[ual]+education
</code></pre>
which would detect all of the following: smartass, bigass, fatass, <i>and</i> ass itself. The second rule would <i>not</i> filter `sex(?:ual)` token followed by an `education` token. You get the idea<p>These are just some heat-of-the-moment ideas, because I think this is exciting and could be useful. :-)
With the little effort of google translate your dirty words to Spanish (copy paste all words), you obtain a filter for Spanish, add synonyms for stronger filtering.<p>Perhaps gay is not a dirty word? (is included in your dirty words, but gay people should think otherwise.
I inherited a (dreadful) application which had a hilariously lame 'rude words filter'. It checked for words on a banned list.<p>The full list is here:
<a href="http://pastebin.com/raw.php?i=1Pv4v8j7" rel="nofollow">http://pastebin.com/raw.php?i=1Pv4v8j7</a><p>It contains such gems as "cockburger", "penispuffer", and -- the piece de resistance -- "twatwaffleunclefucker".