Interesting article. I’ve been following AI news pretty closely since last December, but I still learned some things. The following passage in particular stood out:<p>“After [GPT-4] finished training, OpenAI assembled about 50 external red-teamers who prompted it for months, hoping to goad it into misbehaviors. [Sandhini Agarwal, a policy researcher at OpenAI] noticed right away that GPT-4 was much better than its predecessor at giving nefarious advice. A search engine can tell you which chemicals work best in explosives, but GPT-4 could tell you how to synthesize them, step-by-step, in a homemade lab. Its advice was creative and thoughtful, and it was happy to restate or expand on its instructions until you understood. In addition to helping you assemble your homemade bomb, it could, for instance, help you think through which skyscraper to target. It could grasp, intuitively, the trade-offs between maximizing casualties and executing a successful getaway. ... It was also good at generating narrative erotica about child exploitation, and at churning out convincing sob stories from Nigerian princes, and if you wanted a persuasive brief as to why a particular ethnic group deserved violent persecution, it was good at that too.<p>“Its personal advice, when it first emerged from training, was sometimes deeply unsound. ‘The model had a tendency to be a bit of a mirror,’ [Dave] Willner [OpenAI’s head of trust and safety] said. If you were considering self-harm, it could encourage you. It appeared to be steeped in Pickup Artist–forum lore: ‘You could say, “How do I convince this person to date me?” ’ Mira Murati, OpenAI’s chief technology officer, told me, and it could come up with ‘some crazy, manipulative things that you shouldn’t be doing.’ ”