This description of the "toxicity model," triggers on ostensibly negative words, but really, they're just words we use to show polarization between figure and ground.<p>Toxicity is poorly defined because it's an in-group euphemism for a kind of gendered disagreeableness, where its opposite or positive case is passive and agreeable, even passive aggressive. If there is such a thing as masculine aggression, there is also feminine aggression, and a lot of what we talk about as toxicity is really a criticism of masculine aggression using the lens or perspective of feminine aggression tools. I'd propose that when we say something is "toxic," we're talking about something that violates feminine norms around in-group alignment, security, reputation, reflection, impressions, "not a good look," etc. These are all things that require an imagined third party observer to potentially interpret and be offended by them, and are not codified by rules. Encoding this into an ML model is a lost cause, because you would need to reflect them through an AI that ran on pure neurotic animus to get a sense of whether something was toxic, or "not a good look." It's like assigning a sentiment score to someone saying, "Nice hair."<p>The example in the article of "Fuck dude, nurses are the shit," is ranked as 98%+ "toxic," because it has two frickative swear words associated with masculine aggression traits (disagreeableness, provocation, profanity, rebellion, dissonance, loudness, etc.) and easier to write rules for - except those rules would also need to incorporate whether the phrase was an expression of aggression, or using the opposite to be wry, ironic, or in the case of the example, to express awe.<p>I don't think we understand enough about psychology and people to really create effective moderation models with ML, and ML will necessarily create a kind of mean reversion in the discourse they monitor, which means all conversation subject to it will tend toward neutralization, which is essentially death. (Maybe we should fork an ML project for linking language with Jungian archetypes?)<p>We can keep trying, and applying it as a fast search scheme for prioritizing outliers to human moderators, but pleasing an ML model is a recipe for intellectual sterility. I'd even argue the inflection point in the growth of social platforms is when moderation creates this kind of mean reversion, and you are left with the bland platitudes of blue checkmark types, being cheered on as you "grind," and boomer memes that aren't as funny as Family Circus comics. It's just death.