I don't think all this is needed to prove that LLMs aren't there yet.<p>Here is a simple trivial one:<p>"make ssh-keygen output decrypted version of a private key to another file"<p>I'm pretty sure everyone on the LLM hypetrain will agree that just that prompt should be enough for GPT-4o to give a correct command. After all, it's SSH.<p>However, here is the output command:<p><pre><code> ssh-keygen -p -f original_key -P "current_passphrase" -N "" -m PEM -q -C "decrypted key output" > decrypted_key
chmod 600 decrypted_key
</code></pre>
Even the basic fact that ssh-keygen is an in-place tool and does not write data to stdout is not captured strongly enough in the representation for it to be activated with this prompt. Thus, it also overwrites the existing key, and your decrypted_key file will contain "your identification has been saved with the new passphrase", lol.<p>Maybe we should set up a cron job - sorry, chatgpt task - to auto-tweet this in reply to all of the openai employees' hype tweets.<p>Edit:<p>chat link: <a href="https://chatgpt.com/share/67962739-f04c-800a-a56e-0c2fc8c2ddf8" rel="nofollow">https://chatgpt.com/share/67962739-f04c-800a-a56e-0c2fc8c2dd...</a><p>Edit 2: Tried it on deepseek<p>The prompt pasted as is, it gave the same wrong answer: <a href="https://imgur.com/jpVcFVP" rel="nofollow">https://imgur.com/jpVcFVP</a><p>However, with reasoning enabled, it caught the fact that the original file is overwritten in its chain of thought, and then gave the correct answer. Here is the relevant part of the chain of thought in a pastebin: <a href="https://pastebin.com/gG3c64zD" rel="nofollow">https://pastebin.com/gG3c64zD</a><p>And the correct answer:<p><pre><code> cp encrypted_key temp_key && \
ssh-keygen -p -f temp_key -m pem -N '' && \
mv temp_key decrypted_key
</code></pre>
I find it quite interesting that this seemingly 2020-era LLM problem is only correctly solved on the latest reasoning model, but cool that it works.