I have a jailbreaking method that's 100% effective but I can't share it until the authors of this article share theirs because it seems like we can just make up claims about effectiveness without sharing any evidence.
A new jailbreaking method with this level of effectiveness against these models that can produce the entirety of those unsafe outputs?<p>Yes.<p>May I see it?<p>No.
A lot of handwringing in this article about the harm jailbreak cause and the responsibility to not release them, then the example of the harms that could be caused is racist jokes? And instructions on making a bomb, that by definition of being in the dataset can already be found on the internet, probably just with a Google search? Instructions to create fake social media accounts? It's very silly to read this level of seriousness like these models would make criminal masterminds if they but released the jailbreak. Let's be real, all the jailbreaks would be useful for in real life is creating custom erotica.