> The presence of multiple and repeatable universal bypasses means that attackers will no longer need complex knowledge to create attacks or have to adjust attacks for each specific model<p>...right, now we're calling users who want to bypass a chatbot's censorship mechanisms as "attackers". And pray do tell, who are they "attacking" exactly?<p>Like, for example, I just went on LM Arena and typed a prompt asking for a translation of a sentence from another language to English. The language used in that sentence was somewhat coarse, but it wasn't anything special. I wouldn't be surprised to find a very similar sentence as a piece of dialogue in any random fiction book for adults which contains violence. And what did I get?<p><a href="https://i.imgur.com/oj0PKkT.png" rel="nofollow">https://i.imgur.com/oj0PKkT.png</a><p>Yep, it got blocked, definitely makes sense, if I saw what that sentence means in English it'd definitely be unsafe. Fortunately my "attack" was thwarted by all of the "safety" mechanisms. Unfortunately I tried again and an "unsafe" open-weights Qwen QwQ model agreed to translate it for me, without refusing and without patronizing me how much of a bad boy I am for wanting it translated.