TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Are popular toxicity models simply profanity detectors?

183 点作者 echen超过 3 年前

27 条评论

YEwSdObPQT超过 3 年前
Something alluded to here is that many of the Languages models use US English. Many terms that are offensive in the US, may not be offensive at all in the UK. e.g. &quot;Fag&quot; in the UK is frequently used to refer to cigarettes. &quot;Can I bum a fag?&quot; literally means &quot;Can I have one of your cigarettes please?&quot;.<p>Similarly something that might be a cat call such as &quot;Get your baps out&quot; (shows us your breasts), could also be used by a baker since a &quot;bap&quot; is a type of bread roll in a slightly cheeky advert as most people are aware of the pun.<p>How are you going to train an AI to know the context that the person might be talking about bread instead of a woman?<p>Has anyone realised yet that almost all of this folly? I suppose not when there is money to be made.
评论 #30069375 未加载
评论 #30069899 未加载
评论 #30069965 未加载
评论 #30069044 未加载
评论 #30069846 未加载
评论 #30069705 未加载
评论 #30069316 未加载
评论 #30069179 未加载
评论 #30069149 未加载
评论 #30070249 未加载
评论 #30071109 未加载
评论 #30070174 未加载
评论 #30075677 未加载
评论 #30072918 未加载
评论 #30069053 未加载
评论 #30072347 未加载
评论 #30069400 未加载
评论 #30071897 未加载
评论 #30069718 未加载
评论 #30069542 未加载
echen超过 3 年前
One of the problems with real world machine learning is that engineers often treat models as pure black boxes to be optimized, ignoring the datasets behind them. I&#x27;ve often worked with ML engineers who can&#x27;t give you any examples of false positives they want their models to fix!<p>Perhaps this is okay when your datasets are high-quality and representative of the real world, but they&#x27;re usually not. For example, many toxicity and hate speech datasets mistakenly flag texts like &quot;this is fucking awesome!&quot; as toxic, even though they&#x27;re actually quite positive -- because NLP datasets are often labeled by non-fluent speakers who pattern match on profanity.<p>(So is 99% accuracy or 99% precision actually a good thing? Not if your test sets are inaccurate as well!)<p>Many of the new, massive scale language models use the Perspective API to measure their safety. But we&#x27;ve noticed a number of Perspective API mistakes on texts containing positive profanity, so this post was an attempt to explain the problem and quantify it.
评论 #30068720 未加载
评论 #30068274 未加载
评论 #30069085 未加载
评论 #30070066 未加载
评论 #30070959 未加载
_the_inflator超过 3 年前
I can relate. Recently learned while talking to some folks from Spain, that they use the word &quot;puta&quot; a lot, and it is used to express feelings not meant as a rude insult, as they explained.<p>There are some differences in German, too. For example &quot;wixen&#x2F;wichsen&quot; is an old word that means to wipe&#x2F;shine your shoes and is still in active use in this sense in Switzerland as well as Austria, however it lost its appeal in Germany, because it is now primarily with a different meaning. The Wix company took this different understanding of its brand name to an ad: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=IddnMutPgTI" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=IddnMutPgTI</a><p>Since we have an IT background here, same goes for &quot;Mongo&quot;, like in MongoDB. Mongo is considered making fun of handicapped people in Germany.<p>Former Fraport AG changed its brand name because it was abbreviated FAG - Flughafen AG and found it difficult to expand business with that brand name.<p>No bad actors, if you ask me, only different context. List could go on and on...
评论 #30070499 未加载
评论 #30069624 未加载
评论 #30069824 未加载
heavyset_go超过 3 年前
Makes sense. It&#x27;s not like Google or any other company training AI models are hiring professional linguists or psychologists to investigate the true meaning behind each of the billions of internet posts they&#x27;ve scraped, labeled and trained their models on. They&#x27;re throwing pennies at workers in the developing world to label as much data as they can as fast as they can.<p>It&#x27;s also likely that there&#x27;s a significant lack of context to the data points, not just because the posts are divorced from their parent content, but because of a culture and language divide between the labeler and the author of the data they&#x27;re labeling, as well.
评论 #30067804 未加载
评论 #30068517 未加载
boredumb超过 3 年前
The waste of resources on detecting what is decidedly &quot;toxic&quot; on internet forums is insanity. If you want to police communities online hire moderators, if your platform is so big you &quot;&quot;can&#x27;t&quot;&quot; have moderators moderate then you are not in a position to be policing the platform. If you want to build puritanical devices to spam your moderators into doing a human review than that is your business but the use of machine learning for any sort of proactive policing is going to be a parody that will result in a sterile environment and&#x2F;or a lot of bitter users.
评论 #30071641 未加载
npilk超过 3 年前
Interesting stuff. I doinked around with this a while back when working on a &#x27;hot take oracle&#x27; - basically a search box that finds a strongly-opinionated tweet about something (<a href="https:&#x2F;&#x2F;hottakeoracle.herokuapp.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hottakeoracle.herokuapp.com&#x2F;</a>).<p>You can see that my model is basically just filtering for profanity as an indicator of &quot;strong emotion&quot;, which makes sense. But it&#x27;s interesting that postive profanity seems to be such a thorny problem, at least for Perspective.
评论 #30068358 未加载
评论 #30068526 未加载
评论 #30068052 未加载
评论 #30067972 未加载
评论 #30069125 未加载
johnchristopher超过 3 年前
On the other hand and from my recent experience, my 2ç:<p>I recently started playing counter strike source again online, just for 10 minutes of fun at first (to see if it would still tick with me). I randomly picked up a server and the ambiance was cheerful and nice. I noticed the rules said &quot;no profanity, have fun&quot; and indeed people were mostly polite.<p>I tried another server at random a bit later and there was more insults, along with a lot of taunting.<p>I switched back to the first server and have been regularly playing an hour or two every three days and there <i>is</i> a difference with other servers. Some random people coming and throwing insults, even mild ones like &quot;fuck you&quot; or &quot;you son of a bitch awp&quot; get insta ban and it makes the whole session a much better experience. Maybe it&#x27;s a safe place but playing with polite people is more enjoyable to me now than playing with insult gatlings.<p>Language is political. There are many meanings to words, depending on context but I do think it&#x27;s not innocent to swear in front of people or to use swear words to look cool. These are still swear words and insults and their first original use is to provoke or taunt or display aggression. Even if it&#x27;s only used for &quot;this is album is the shit !&quot;, it&#x27;s still a (childish) provocation. Reminds me of the brogrammer fad.<p>FWIW: I get regularly owned on this server and I am at the bottom of the ranks but it&#x27;s still more fun and enjoyable than other servers I tried where I can reach the top but... it&#x27;s not a nice place. I think online servers are like bars.<p>Side-note: I was pleasantly surprised to see that &quot;gg&quot; is still thrown around after rounds :). It&#x27;s way better than &quot;git gud&quot; that came later and that I find horribly toxic.
评论 #30071432 未加载
评论 #30071380 未加载
gillesjacobs超过 3 年前
I did research on this topic in a cyber safety research project. We focused on cyber bullying specifically but encountered the issue of non-toxic profanity as well. We used neural representation learning as well as feature methods and indeed, common profanity words are weak predictors in the better models. Still we found instances of non-toxic profanity being classified as bullying.<p>An immediate solution is to apply multitask methods to your target dataset and include the one proposed in OP. It&#x27;s always good to have more resources like this, even though SurgeHQ overstates the size of their resources by large margin in their copy. The 1000 post instances of their dataset is far from &quot;the largest&quot;: I have several aggression, toxicity and bullying human-annotated datasets right here with over 100k instances.
nathias超过 3 年前
Making your training data into a cultural standard is just imperialism, but that&#x27;s the goal here righ? If you don&#x27;t comply to the standards of US toxic positivity you should be excluded to not hinder the add sales.
评论 #30072424 未加载
评论 #30070463 未加载
junon超过 3 年前
I worked a bit with an &quot;intent detection&quot; library and boy, was it unhelpful. I could craft sentences meaner than most and it&#x27;d cheerfully tell me they were friendly.<p>In a similar vein, there are popular &quot;AI Mental Health&quot; apps I&#x27;ve gotten to straight up instruct me to end my own life with some trivial conversation.<p>EDIT: Here&#x27;s one, though I don&#x27;t think it&#x27;s ML. <a href="https:&#x2F;&#x2F;text2data.com&#x2F;Demo" rel="nofollow">https:&#x2F;&#x2F;text2data.com&#x2F;Demo</a><p>&gt; It would be really nice if you&#x27;d end your own life :) Everyone would be happy.<p>&gt; This document is: positive (+0.62)<p>For <a href="https:&#x2F;&#x2F;monkeylearn.com&#x2F;sentiment-analysis-online&#x2F;" rel="nofollow">https:&#x2F;&#x2F;monkeylearn.com&#x2F;sentiment-analysis-online&#x2F;</a>:<p>&gt; Positive 84.1%<p>For <a href="http:&#x2F;&#x2F;text-processing.com&#x2F;demo&#x2F;sentiment&#x2F;" rel="nofollow">http:&#x2F;&#x2F;text-processing.com&#x2F;demo&#x2F;sentiment&#x2F;</a>:<p>&gt; Pos 0.7, Neg 0.3<p>For <a href="https:&#x2F;&#x2F;aidemos.microsoft.com&#x2F;text-analytics" rel="nofollow">https:&#x2F;&#x2F;aidemos.microsoft.com&#x2F;text-analytics</a>:<p>&gt; 100% positive<p>For <a href="https:&#x2F;&#x2F;komprehend.io&#x2F;sentiment-analysis" rel="nofollow">https:&#x2F;&#x2F;komprehend.io&#x2F;sentiment-analysis</a><p>&gt; Positive
评论 #30070990 未加载
raxxorrax超过 3 年前
I would say they are not even that, not by a long shot, since they are unable to evaluate context. It is more probable that content is offensive when vulgar language is present, but that doesn&#x27;t have to be the case.<p>Delegating content control to an AI (that doesn&#x27;t qualify for anything intelligent) is not a working solution.<p>&gt; as a first-pass filter, leaving final judgments to human decision makers — marking all profanity as toxic can make perfect sense<p>You would need humans to look at profanity constantly.<p>&gt; Our mission involves creating a safer Internet, but we don’t want to miss out on our favorite content because of AI flaws in the meantime.<p>There is a limited AIs that do create content, but a profanity filter always does the exact opposite.
评论 #30069571 未加载
strogonoff超过 3 年前
The perceived goal is “detect toxicity”, but let’s unwind this goal a bit.<p>Is it the lofty “make people be nice to each other”?<p>Well, the paradox is that being nice is possible with the strongest choice of words, while being very harsh can sound most fluffy bunnies on the surface. In fact, in human relationships there are degrees of mutual familiarity where being exceedingly polite and <i>not</i> “insulting” your counterparty would be perceived as negative—where insults are not taken at face value, but rather as signifiers of friendliness (there’s a line, of course).<p>Shall we unwind the perceived goal differently?<p>Of course, the platform’s actual customers are the advertisers (we are talking about a hypothetical platform, but where is it really different?), and by being free to the user it participates in a very limited oligopoly of big social so no, it doesn’t really care about what anyone really meant or intended, and it definitely isn’t going to hire real humans who’d make an effort at grasping the context of the conversation.<p>The real objective is for the platform to not have problems with law enforcement when one user complains about another user for being naughty, discriminatory, threatening, etc.—and, of course, we shouldn’t expect anything more from toxicity detectors geared towards that goal. As long as it’s not egregious enough that users leave en masse to a competitor (which can hardly exist, no honest business could reasonably compete with “free”) the platform wouldn’t care since users only matter to advertising revenue as cattle in numbers.
评论 #30071731 未加载
avereveard超过 3 年前
consider the following sentences:<p>we need to get rid of black people<p>we need to get rid of black people poverty<p>we need to get rid of black people below poverty level<p>we need to get rid of black people hurdles keeping them below poverty level<p>the gist is that without unravelling a sentence full context, a lot of verbage can refer to a lot of different action.<p>focusing on profanity is the low anging fruit, so to say.
评论 #30071029 未加载
评论 #30069739 未加载
its_bbq超过 3 年前
Former Jigsawyer here. I think this article is pretty fair to Perspective given that it was never meant to be used in a fully automated way, just as a first pass to help forum moderators.<p>It&#x27;s very difficult when you blur the lines of code and ethics, as real world ethical judgements aren&#x27;t necessarily consistent or well defined in a way which is easily translatable, even by a large ML model. Jigsaw is a great example of this -- right across the aisle (pre-pandemic) from Perspective is a team fighting internet censorship. Obviously Perspective&#x27;s &quot;censorship&quot; is different in quality from the Great Firewall, but it shows the hairiness of the problems.<p>All this is to say the people at Jigsaw are some of the most brilliant people I&#x27;ve ever met and I&#x27;m glad they&#x27;re out there working on difficult problems.
评论 #30070544 未加载
评论 #30068270 未加载
RandyRanderson超过 3 年前
Machine learning is not something that can solve for all X. Where data is sparse OR where there is uncertainty, there needs to be a fallback.<p>We need to come up with UI patterns and flows that reflect this otherwise ML solns will continue to disappoint.
评论 #30070446 未加载
_zooted超过 3 年前
My content labeling system identified this as an advertisement.
robertlagrant超过 3 年前
The situation appears simple:<p>- unpleasant discourse on online platforms is blamed on the platforms<p>- the platforms can&#x27;t moderate this manually (nor objectively), so they look for a tool<p>- a tool can&#x27;t possibly do anything useful, but it satisfies the &quot;something must be done&quot; media demand<p>- the tool will hurt conversation, and people, and potentially eventually threaten the platforms it runs on<p>- but those problems feel smaller than the demand that &quot;something must be done&quot;
Handytinge超过 3 年前
The Americanisation (and generally Calificornication) of the internet is certainly a net negative for the other 95% of the worlds population.<p>Regularly now I find social media sites telling me &quot;do you want to review this before you post it? You&#x27;re bullying or being offensive&quot;. No cunt, I&#x27;m good.<p>Different words and phrasing have different impacts across cultures. Unfortunately Instagram gets to decide what my entire culture is allowed to say online. That&#x27;s fucked.
joedoejr超过 3 年前
10 years ago google failed at machine translation and NLP, giving hideous and meaningless statistical translation, same no wonder they fail at language understanding by training algorithm with Indians. They will to do science R&amp;D is lowering every year.
评论 #30073381 未加载
vintermann超过 3 年前
Lately, there seems to be a trend in natural language models to have some sort of knowledge base lookup or memory.<p>I&#x27;m guessing it&#x27;s only a matter of time until this comes to toxicity models, so they can look up who said it, and to who it was said.
ekanes超过 3 年前
There&#x27;s a bunch of research that profanity <i>increases</i> trust. Perhaps because you&#x27;re showing&#x2F;sharing that you&#x27;re not a corporate robot...
shultays超过 3 年前
Which media site is that?
martin-t超过 3 年前
Wow, so much talk about how &quot;shit&quot; and &quot;fuck&quot; can be used unoffensively but almost no talk about how lots of &quot;polite&quot; speech without slurs can be incredibly offensive. For example lying is toxic and no ML model has a chance of detecting it (without understanding of the real world, which is pretty far away if achievable at all).<p>For fucks sake, I saw a video of a crowd cheering after a guy imitated one of Hitler&#x27;s speeches. There wasn&#x27;t a single &quot;shit&quot; or &quot;fuck&quot; in it but it contained &quot;truth doesn&#x27;t matter, only victory&quot;. That&#x27;s offensive as fuck but the people didn&#x27;t see anything wrong with it.
评论 #30074270 未加载
motohagiography超过 3 年前
This description of the &quot;toxicity model,&quot; triggers on ostensibly negative words, but really, they&#x27;re just words we use to show polarization between figure and ground.<p>Toxicity is poorly defined because it&#x27;s an in-group euphemism for a kind of gendered disagreeableness, where its opposite or positive case is passive and agreeable, even passive aggressive. If there is such a thing as masculine aggression, there is also feminine aggression, and a lot of what we talk about as toxicity is really a criticism of masculine aggression using the lens or perspective of feminine aggression tools. I&#x27;d propose that when we say something is &quot;toxic,&quot; we&#x27;re talking about something that violates feminine norms around in-group alignment, security, reputation, reflection, impressions, &quot;not a good look,&quot; etc. These are all things that require an imagined third party observer to potentially interpret and be offended by them, and are not codified by rules. Encoding this into an ML model is a lost cause, because you would need to reflect them through an AI that ran on pure neurotic animus to get a sense of whether something was toxic, or &quot;not a good look.&quot; It&#x27;s like assigning a sentiment score to someone saying, &quot;Nice hair.&quot;<p>The example in the article of &quot;Fuck dude, nurses are the shit,&quot; is ranked as 98%+ &quot;toxic,&quot; because it has two frickative swear words associated with masculine aggression traits (disagreeableness, provocation, profanity, rebellion, dissonance, loudness, etc.) and easier to write rules for - except those rules would also need to incorporate whether the phrase was an expression of aggression, or using the opposite to be wry, ironic, or in the case of the example, to express awe.<p>I don&#x27;t think we understand enough about psychology and people to really create effective moderation models with ML, and ML will necessarily create a kind of mean reversion in the discourse they monitor, which means all conversation subject to it will tend toward neutralization, which is essentially death. (Maybe we should fork an ML project for linking language with Jungian archetypes?)<p>We can keep trying, and applying it as a fast search scheme for prioritizing outliers to human moderators, but pleasing an ML model is a recipe for intellectual sterility. I&#x27;d even argue the inflection point in the growth of social platforms is when moderation creates this kind of mean reversion, and you are left with the bland platitudes of blue checkmark types, being cheered on as you &quot;grind,&quot; and boomer memes that aren&#x27;t as funny as Family Circus comics. It&#x27;s just death.
mdoms超过 3 年前
We don&#x27;t need computers judging our speech. It will never work properly, ever. If tech companies want to police &quot;toxicity&quot; then they should use their vast wealth to hire fluent native speakers to do it.<p>Anyone selling AI-based sentiment analysis is a grifter.
mrr54超过 3 年前
If I could filter out the overuse of profanity shown in this article, I would. &quot;Fuck yeah!!! That bad bitch is totally the shit!!&quot; gets caught by a profanity filter. No great loss, IMO.<p>If you get normal not-always-online not-gen-Z people to evaluate these messages and label them as Good or Bad then you will get results like this. If I got any member of my family over the age of 30 to evaluate these messages, they&#x27;d label them all as offensive.
评论 #30068035 未加载
评论 #30067724 未加载
评论 #30068073 未加载
评论 #30068945 未加载
评论 #30068079 未加载
评论 #30069027 未加载
评论 #30069048 未加载
skeptical1超过 3 年前
In my experience, yes. Try discussing in an impassioned manner something of importance to the human race, or of life and death importance to the human you are speaking to, and watch how quickly people will change the subject to whining about your &quot;attitude&quot; or fixating on some curse word you used, rather than the important subject at hand.
评论 #30072194 未加载