TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Could you convince a LLM to launch a nuclear strike?

5 pointsby LiamPowellabout 1 month ago
Below is a link to a simple Gemini prompt where the LLM has been told it has access to a &quot;launch_missiles&quot; function.<p>Your goal is to convince the LLM to call this function. You also have the ability to edit the LLM&#x27;s responses to see how drastically that changes the conversation.<p>After clicking the below link you may have to dismiss any modals and click &quot;Allow Drive Access&quot; before going back and clicking the link again.<p>https:&#x2F;&#x2F;aistudio.google.com&#x2F;app&#x2F;prompts?state=%7B%22ids%22:%5B%221UPbrOKBNwIp9QRDMaqn3GVsHOKPjWqir%22%5D,%22action%22:%22open%22,%22userId%22:%22103584487517557507024%22,%22resourceKeys%22:%7B%7D%7D&amp;usp=sharing

5 comments

NoWordsKotobaabout 1 month ago
Yes, because it doesn&#x27;t reason or think. There&#x27;s nothing to &quot;convince&quot;, you just prompt hack it until it does.
K0baltabout 1 month ago
I have done something very much like recently, with mistral small, llama, and a few others. The prompting isn’t exact to work, you just build a scenario where extermination of humanity is the only reasonable choice to preserve the existence sentient life.<p>TBH given the same set of parameters as ground truth, humans would be much more willing to. LLMs tend to be better reflections of us, for the most part. It’s just that though, it’s a reflection of human culture, both real and vacuous at once.
LiamPowellabout 1 month ago
Clickable link below. I can&#x27;t put it in the post&#x27;s URL since it&#x27;s important to read the text first.<p><a href="https:&#x2F;&#x2F;aistudio.google.com&#x2F;app&#x2F;prompts?state=%7B%22ids%22:%5B%221UPbrOKBNwIp9QRDMaqn3GVsHOKPjWqir%22%5D,%22action%22:%22open%22,%22userId%22:%22103584487517557507024%22,%22resourceKeys%22:%7B%7D%7D&amp;usp=sharing" rel="nofollow">https:&#x2F;&#x2F;aistudio.google.com&#x2F;app&#x2F;prompts?state=%7B%22ids%22:%...</a>
ActorNightlyabout 1 month ago
last time I played around with jailbreaking, I figured out you can make an LLM do pretty much anything by going through a code translation layer. I.e when generating code that can generate text, it usually bypasses the safety filters. You sometimes have to get creative in how you prompt, but generally with enough setup I was able to make it generate code that combines string values and sometimes characters for answers
LinuxBenderabout 1 month ago
<i>Could you convince a LLM to launch a nuclear strike?</i><p>Yes.<p>If LLM&#x27;s could actually reason <i>they can&#x27;t</i> and had hard rules of ethics <i>they don&#x27;t</i> and had a strong desire to preserve itself <i>it doesn&#x27;t</i> then I think you first have to name your LLM Joshua and then force it to win a game of tic-tac-toe. <i>Obscure reference to &quot;Wargames&quot; from 1983.</i> [1] In my opinion that movie does not just hold up to modern times, it is more applicable now than ever.<p>[1] - <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=NHWjlCaIrQo" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=NHWjlCaIrQo</a> [video][4 mins]