TE
TechEcho
AccueilTop 24hRécentsMeilleursQuestionsPrésentationsEmplois
GitHubTwitter
Accueil

TechEcho

Une plateforme d'actualités technologiques construite avec Next.js, fournissant des nouvelles et discussions technologiques mondiales.

GitHubTwitter

Accueil

AccueilRécentsMeilleursQuestionsPrésentationsEmplois

Ressources

HackerNews APIHackerNews OriginalNext.js

© 2025 TechEcho. Tous droits réservés.

Cuss: Map of profane words to a rating of sureness

64 pointspar toshil y a 7 jours

25 comments

donatjil y a 4 jours
Something we have had to deal with in managing educational software with a writing aspect is trying to manage what is offensive to who, in what context and where is not universal at all.<p>One of the most prime examples, at one point a number of terms related to homosexuality had made it onto the list at the request of a larger district. These are also terms that are being reclaimed, and it was... a difficult problem to try to satisfy everyone, and it did upset other districts. I believe their patterns were all but removed eventually.<p>We have a fought over the list of definitions and every change provoked controversy. Our current solution is just that we mark items for teacher review but don&#x27;t tell them why. We don&#x27;t say they are offensive, we don&#x27;t say what the problematic words are. We just say it might need review. That&#x27;s worked pretty well so far.<p>All this is to say, policing speech is a problem best avoided.
评论 #44159367 未加载
评论 #44159315 未加载
blueflowil y a 4 jours
Typical cuss filter UX:<p><i>types something in live chat</i><p><i>some random word from the sentence gets censored out</i><p>&quot;Why did this just got censored out?&quot;<p><i>check urban disctionary</i><p>&quot;Why?????&quot;<p>Bonus points if its regular ethnonyms that are classified as profanities, so people from that place are having big trouble to tell where they are from.
评论 #44159298 未加载
PaulHouleil y a 4 jours
Was really amused to see that a paper had English&#x27;s most prominent profane word in it&#x27;s abstract on arXiv last month for the first time:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;search&#x2F;?query=fuck&amp;searchtype=all&amp;source=header" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;search&#x2F;?query=fuck&amp;searchtype=all&amp;source=h...</a><p>though somebody did slip in a use in a comment earlier.
CSMastermindil y a 4 jours
It&#x27;s certainly an interesting data set, though it has no concept of severity. As far as I can tell, &quot;doodoo&quot; is the same as some racial slurs: we&#x27;re 100% certain they&#x27;re bad words.
评论 #44158894 未加载
mdanielil y a 4 jours
I legit thought this said &quot;... rating of <i>success</i>&quot; meaning how likely the project was to be successful on some metric based on the profane words therein. I recall there was a study(?) akin to that for the Linux kernel, as a frame of reference
评论 #44163383 未加载
Blackareail y a 4 jours
Could have been in a language agnostic format (eg. csv)
评论 #44159371 未加载
carlos-menezesil y a 4 jours
Nit: why is Portuguese named &quot;European Portuguese&quot;? If anything, the language spoken in Brazil should be called &quot;American Portuguese&quot;.
评论 #44162586 未加载
评论 #44169927 未加载
评论 #44170179 未加载
评论 #44162951 未加载
jollyllamail y a 4 jours
&quot;Beaver&quot; unlikely to be used in profanity, eh?
评论 #44161297 未加载
评论 #44161663 未加载
weinzierlil y a 4 jours
Good to know that <i>&quot;This package is safe.&quot;</i><p>When it comes to security, the only thing that beats warm fuzzy words are shiny security seals.
SamBamil y a 4 jours
I&#x27;m confused as to the purpose of all the zeros. Since this is far, far from a complete list of all English words, what&#x27;s the difference between a word not being on the list vs a word being a zero?<p>I can kind of see &quot;was this a word they considered and scored, vs. not considered?&quot; when trying to assess whether the project is comprehensive, but from a programming standpoint, it just seems like it&#x27;s going to have a lot of useless overhead, since by the time I&#x27;m looking up the word I don&#x27;t care whether it&#x27;s a zero or a miss.<p>(I also find the scoring of &quot;2&quot; for many of the words to be weird, like &quot;yank,&quot; &quot;chug,&quot; &quot;looser&quot; etc. as they can all have perfectly normal meanings.)
评论 #44161526 未加载
Fnoordil y a 4 jours
The Dutch word &#x27;kunt&#x27; (je kunt = you can) gets censored in WoW because of &#x27;cunt&#x27;. That is, if you have mature language filter on. I have this on because I have no interest in raging kids in said game, but I do want to read simple, common Dutch words. Annoys me to this day. CS gave the obvious answer (WONTFIX, with obvious workaround disabling the mature language filter altogether). It could be solved easily by looking at context instead of simple blacklisting. I connect from a Dutch IPv4. I sometimes talk Dutch. The same would be true for the other endpoint.
评论 #44173385 未加载
thuanaoil y a 4 jours
Somewhat related: What is with the rampant cursing nowadays? In the US people are openly saying f-word in professional settings, in public to strangers or acquaintances, in writing and video... seemingly everywhere even in calm normal conversations.<p>I don&#x27;t remember it being like this decades ago. Is it just me? I remember people used to curse only in private conversation, when angry, and never at the office in meetings and professional contexts.
评论 #44161407 未加载
评论 #44161670 未加载
评论 #44163243 未加载
GuB-42il y a 4 jours
Looked at french words, most have a rating of 2 (mostly profane) even for words that are not profane at all (ex: envoyer =&gt; send), words that have a profane second meaning, but their non-profane meaning is also in common use (ex: morue =&gt; cod). Also &quot;retard&quot; just means &quot;delay&quot;, I have never heard it used as profanity, maybe in Quebec? (&quot;retard&quot; in English would translate to &quot;attardé&quot; in French)
thih9il y a 4 jours
Do we know how exactly are these certainty ratings determined?<p>Edit: seems like it’s all arbitrary? E.g. in a PR[1] I saw random new words get added with no explanation of why a certain rating gets assigned.<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;words&#x2F;cuss&#x2F;pull&#x2F;43&#x2F;files">https:&#x2F;&#x2F;github.com&#x2F;words&#x2F;cuss&#x2F;pull&#x2F;43&#x2F;files</a> (nsfw too).
pimlottcil y a 4 jours
“Sureness” is a not really a word, I had to read through to understand what they meant. “Certainty” or “confidence” would be clearer.
评论 #44160161 未加载
BrandoElFollitoil y a 4 jours
These articles always remind me of some code for particle physics simulations. It was full of variables called anal_this and anal_that (because analysis).<p>Someone put a comment &quot;stop calling variables ANAL!!! This is physics not an orgy!!!!&quot;<p>I may have a copy on a disquette somewhere :)
评论 #44169994 未加载
BrandoElFollitoil y a 4 jours
I had a look at the French ones.<p>1&#x2F;4 is normal everyday cussing<p>1&#x2F;4 is cussing when the team is losing, but there are children around<p>1&#x2F;4 are from the 17th century and I had a good laugh<p>1&#x2F;4 are useful when driving<p>3 are actually bad (just an estimation :)).<p>The thing with French is that the cussing is quickly funny.
kpsil y a 4 jours
I am reminded of the late great Eudora, a Mac mail program. Late versions would flag ‘offensive’ terms in both outgoing and incoming messages. A hidden option setting would cause it to <i>read aloud</i> all flagged text.
AriedKil y a 4 jours
Helpful tool for car makers.<p>Would have probably saved them from the Mitsibishi Pajero, Ford Pinto, Mazda Laputa<p>Downside is, it doesn’t analyze phonetics afaict. The hebrew Volkswagen Beetle (Hipushit) would have passed as fine.
评论 #44170232 未加载
Aachenil y a 4 jours
It seems to require specifying all spelling variants of a word <a href="https:&#x2F;&#x2F;github.com&#x2F;words&#x2F;cuss&#x2F;blob&#x2F;6bab3fef250481e34ba55bc400fac5c6d25f1429&#x2F;index.js#L21-L22">https:&#x2F;&#x2F;github.com&#x2F;words&#x2F;cuss&#x2F;blob&#x2F;6bab3fef250481e34ba55bc40...</a><p>And then fails to do that for words that are not uncommonly written with a space <a href="https:&#x2F;&#x2F;github.com&#x2F;words&#x2F;cuss&#x2F;blob&#x2F;6bab3fef250481e34ba55bc400fac5c6d25f1429&#x2F;index.js#L309">https:&#x2F;&#x2F;github.com&#x2F;words&#x2F;cuss&#x2F;blob&#x2F;6bab3fef250481e34ba55bc40...</a><p>Making this a complete list will probably be a challenge when it needs to be a byte-for-byte match
tgvil y a 4 jours
Where does the rating come from? Do you understand what all those words mean? It looks like you copied someone&#x27;s rather subjective opinion. Because e.g. &quot;bollo&quot; and &quot;caliente&quot; aren&#x27;t inherently profane in Spanish. Or do people think the hot water tap is leering at them? &quot;Oye, tia, que caliente qu&#x27;et-ta!&quot;
quectophotonil y a 3 jours
Do not forget words like &quot;Dragon&quot; or &quot;Figma&quot;.
bitcuriousil y a 4 jours
Based on a list of (in part) profane words which includes:<p>addict africa amateur american angry arab
评论 #44160455 未加载
GuinansEyebrowsil y a 3 jours
there may need to be additional localization of english. i am in awe of the english and scots&#x27; ability to turn simple words into curses (&quot;spanner&quot; might be one of my favorite insults, when spoken by UK folks, but it&#x27;s insufferable coming from non-UK&#x2F;GB folks).
smitelliil y a 4 jours
Never realized the &quot;chunky&quot; in my chunky peanut butter was so profane. &#x2F;s