科技回声

As a programmer, I find it fascinating to build things from the ground up, with the inner workings either in full display or readily accessible for editing. With AI, the need to beg it to please behave, with a long list of things to do and not to do and a resounding order not to disclose such a list, is becoming commonplace.<p>Obviously, finding jailbreaks in LLMs is extremely important and consequential. However, there are meta questions around modern AI that remain valid, and this article is a reminder: is a continuous and <i>direct</i> feedback loop between code and coder a thing of the past? To what extent should we accept that LLMs are trained one-way, that we can only truly edit them with expensive trial-and-error retraining runs, hence, all we are left with is asking kindly? Are the current implementations all, or are we dealing with just one possible paradigm? Do we want AI, which relies upon computers, algorithms, and numbers written on memory, to be fundamentally programmable?

They could probably just visit Reddit, there are ample prompts there.<p>Prompt injection or prompt attacks are well known and likely impossible to guard against. Can you really get a human to be invulnerable to manipulation? Why would we expect the machines to be any better?

White House is working with hackers to ‘jailbreak’ ChatGPT’s safeguards

2 条评论

White House is working with hackers to ‘jailbreak’ ChatGPT’s safeguards

2 条评论