Imagine being called John Graham-Cumming. Long, long ago Google didn't understand that "Cumming" was a name. Google myself, get served ads for adult web sites.<p>And Eudora's Mood Watch feature would flag every single email I sent as offensive.
Tom Scott did a video on this:<p>Why Web Filters Don't Work: Penistone and the Scunthorpe Problem - <a href="https://www.youtube.com/watch?v=CcZdwX4noCE" rel="nofollow">https://www.youtube.com/watch?v=CcZdwX4noCE</a><p>It's well done, like the rest of his content.
Try keeping up with the journals as a chemist while on holiday behind an overeager webproxy. You are told that the subdiscipline of analytical chemistry is out of bounds.<p>But that's a feature. The voting public sees that you are trying hard and failing, that's somehow considered better than shaking your head at the intractable problem.
That's like the British joke about which three football teams have swear words in their names: Arsenal, Scunthorpe, and MANCHESTER FUCKING UNITED. :-D
I've run into this problem myself when parsing recipes for food allergies . Doughnuts has the word nuts in it but doesn't always contain nuts as an ingredient .
Back in the late nineties, I attended the Norwegian University of Technology and Science.<p>Someone in the IT department figured it was an excellent idea to host all student accounts on the stud.ntnu.no subdomain.<p>We got a few odd bounces.
Came across an instance of this recently, I think on the FT's website... It took me a while to figure out what was going on with "smar * * * * ches".
TVTropes as a good list of amusing examples as well: <a href="http://tvtropes.org/pmwiki/pmwiki.php/Main/ScunthorpeProblem" rel="nofollow">http://tvtropes.org/pmwiki/pmwiki.php/Main/ScunthorpeProblem</a>
Many amusing examples in the source page, but this one really stood out.<p>> It also blocked e-mails sent in Welsh because it did not recognize the language.<p>With my (very) limited exposure to Welsh, i kinda get that it would give spam filters fits.
This problem could be solved by defining a logical rule (most probably through a regular expression) that would only filter the bad word when present as a single word.<p>I'm amazed how rarely this simple system is used. Instead you end up with monstrosities such as the power stars chat that mangles most words into unreadable mess of <i></i><i></i><i>.<p>Could be a fun game though. Guess the words!<p></i><i></i>ertion<p>Weight and m<i></i><i>
When I worked for a company that made label printers we had a potential customer who wanted us to print labels with human readable and barcode fields with 4 random letters and 4 random digits but did not want the letters to spell any obscene words. We asked for a list of words to ban but they declined to provide such a list. We did not get the contract.
Note that the problem of words being misunderstood when lacking context is not limited to computers. My father - a chemistry professor - was at a conference a few years ago about Free Radicals when he was approached by a member of the public who wanted to know if he could participate...
I'm not sure if it's still the case, but it used to not be possible to trade certain Pokémon over the global trade system with their default name due to a filter like this.<p>I believe Nosepass and Cofagrigus were two of the affected.