Wow, lot’s of negative responses here on voice. I’m a reader. I read. A lot. And I still think 4o’s advanced voice mode is unique, extremely useful, and I dearly wish we had open models or even some closed competitive models that were as good as it.<p>I will note that the model has been successively nerfed, massively, from launch, you can watch some demo pre-launch videos, or just try out some basic engagement, for instance, try asking it to talk to you in various accents and see which ones Open AI deems “inappropriate” to ask for and which are fine. This kind of enshittification I think is pretty likely when you are the only one in town with a product.<p>That said, even moderately enshittified, there’s something magic about an end to end trained multimodal model — it can change tone of voice on request. In fact, my standard prompt asks it to mirror my tone of voice and cadence. This is really unique. It’s not achievable through a whisper -> LLM -> Synthesizer/TTS approach. It can give you a Boston accent, speculate that a Marseille accent is the equivalent in French, and then (at least try) to give you a Marseille accent. This is pretty strong medicine, and I love it.<p>There’s been so much LLM commoditization this year, and of course the chains keep moving forward on intelligence. But, I hope Ms. Moore is correct that we’ll see better and more voice models soon, and that someone can crack the architecture.