A16Z AI Voice Update 2025

33 点作者 lxm3 个月前

18 条评论

dougb53 个月前

> Voice agents allow businesses to be available to their customers 24/7 to answer questions, schedule appointments, or complete purchases. Customer availability and business availability no longer have to match 1:1 (ever tried to call an East Coast bank after 3 p.m. PT?). With voice agents, every business can always be online.I don't get it -- textual support chatbots have been around for decades. Even if we accept the premise that people would rather speak to them by voice, how do voice agents represent some kind of sea change in availability?(And I personally find customer support chatbots deeply frustrating to use for reasons that have nothing to do with the modality or the quality of the AI model. I only ever need to use one when the question I have is not answered in the documentation, which is often the extent of the chatbot's business-specific training data. Inevitably I end up being led in circles, screaming for a human.)

评论 #43168941 未加载

评论 #43182047 未加载

seydor3 个月前

First, I can't listen to this article so this makes their point kinda less relevant.> It is the most frequent (and information dense)Second, this is false. Voice is effective when the sensory context is available to both people, e.g. in the dinner table where "pass the salt" makes immediate sense. Otherwise it is an erratic form of communication, prone to misunderstanding, often repetitive and redundant.It is not more information dense, but it is the most immediate. The latency of AI applications makes its immediacy less useful.

评论 #43169105 未加载

BrandiATMuhkuh3 个月前

I'm pretty convinced that voice interaction will be the biggest UI change since apps.Voice is simply natural to humans. Downloading an app to learn about the departure of the next bus is not.I used voice bots to let my 5-year-old play role-playing games (e.g., checking into a hotel) or let my parents (60+) call a fake car dealership.It's amazing to observe. They behave as if they're talking to a human, especially when doing it via a phone. That is exactly the UX a computer system should have—simply a phone number and voice.As soon as people have to learn something new (a new webpage, a new app, etc.), something is wrong.

评论 #43168931 未加载

评论 #43168874 未加载

评论 #43169412 未加载

azinman23 个月前

> For enterprises, AI directly replaces human labor with technology. It's cheaper, faster, more reliable — and often even outperforms humans.That’s… quite the claim. I guess we’re picking the worst people, the best voice-based AI, the easiest of scenarios, and a total desire for humanity to remove other human from interaction.Pretty dark and sinister if you ask me.

anonzzzies3 个月前

Voice is the most dense form of communication? Maybe if AI does stt perfectly all the time, but then the reverse, tts is really not very efficient for me; I read far faster and can do a fast skim (taking milliseconds) to see if the answer is in there or reprompt instead of having to listen to the slow warbling of something/someone only to conclude it was worthless. Oh and tts, at least for me, is not perfect; it often gets things wrong making the other side return nonsense too.

评论 #43169008 未加载

muglug3 个月前

I'd much rather type questions than ask them. Being able to review what I've written before I hit send gives me a sense of control lacking in voice interfaces.

ivolimmen3 个月前

I have yet to 'meet' a voice AI on a phone. If I do and I can tell; I will hang up and the company just lost a client. I am a person and I like speaking to persons not machines. If a company thinks I am not worth talking to a human you are not worth my money.

评论 #43169082 未加载

评论 #43169537 未加载

评论 #43169113 未加载

评论 #43168839 未加载

评论 #43173129 未加载

cc62cf4a4f203 个月前

Who cares what businesses do, what I want is an AI agent I can point to a business with my goal (e.g., be relentless in either negotiating my cable bill down by 25% and if after 30 days you fail, cancel my subscription) and have it do it.

评论 #43169608 未加载

maxglute3 个月前

Talking to machines is generational hangup making a lot of anti voice curmudgeons. Watching younglings talk to chatbot like it's just another particapant in a conversation makes the opposition seems futile. TBH I think most of us would love voice interfaces if it was silent... aka subvocalization / functional mind reading, but ultimately that's just talking in your head.My beef with AI voice is it's so fucking slow. As someone use to podcasts at 3-4x speed, I can't wait to ditch human interaction if as voice agents adopt variable speech rate.

评论 #43179015 未加载

vessenes3 个月前

Wow, lot’s of negative responses here on voice. I’m a reader. I read. A lot. And I still think 4o’s advanced voice mode is unique, extremely useful, and I dearly wish we had open models or even some closed competitive models that were as good as it.I will note that the model has been successively nerfed, massively, from launch, you can watch some demo pre-launch videos, or just try out some basic engagement, for instance, try asking it to talk to you in various accents and see which ones Open AI deems “inappropriate” to ask for and which are fine. This kind of enshittification I think is pretty likely when you are the only one in town with a product.That said, even moderately enshittified, there’s something magic about an end to end trained multimodal model — it can change tone of voice on request. In fact, my standard prompt asks it to mirror my tone of voice and cadence. This is really unique. It’s not achievable through a whisper -> LLM -> Synthesizer/TTS approach. It can give you a Boston accent, speculate that a Marseille accent is the equivalent in French, and then (at least try) to give you a Marseille accent. This is pretty strong medicine, and I love it.There’s been so much LLM commoditization this year, and of course the chains keep moving forward on intelligence. But, I hope Ms. Moore is correct that we’ll see better and more voice models soon, and that someone can crack the architecture.

BoorishBears3 个月前

Is this a stolen article to build backlinks?: <a href="https://a16z.com/ai-voice-agents-2025-update/" rel="nofollow">https://a16z.com/ai-voice-agents-2025-update/</a>

评论 #43168965 未加载

wewewedxfgdf3 个月前

I'm not convinced TTS can get all the way to the quality of professional actors for things like audiobooks.I'll take a professional actor over TTS any day - incomparably better quality even with the best TTS.

评论 #43168663 未加载

评论 #43168569 未加载

deadbabe3 个月前

Voice. There’s something about talking to an AI that just always feels wrong. An uncanny valley for audio communication. Maybe it would help if devs dropped the attempt at imitating humans and just made them talk like machines, like Glados or something. At least then you know upfront no one is thinking they can fool you with fake pleasantries.Anthropomorphism is to AI what skeuomorphism is to UIs. I can’t wait for us to move into the “flat design” era of AI, where instead of being patronized with phrases like “Hi! I’m Bobby! Your intelligent AI assistant, how can I help you?” we just get something cold and straight to the point like “Ready for Instructions”, in some crunchy byte encoding. Sorry for the rambling, I’m a little drunk.

评论 #43168721 未加载

评论 #43168658 未加载

评论 #43168497 未加载

reffaelwallen3 个月前

We use <a href="https://www.lindy.ai/" rel="nofollow">https://www.lindy.ai/</a>. I wonder why it's not on the map; I thought it was widely used.

评论 #43168818 未加载

krembo3 个月前

I think many of the negative comments in this thread haven't seen recently the human-machine interactions of the young generations with Siri and her chatbot friends.

threeseed3 个月前

AI voice is like AI art. I am sure many people will appreciate it and love it.But the whole point of this medium is that you want the humanity and personality. Otherwise just use text.

mohsen13 个月前

Lots of negativity in the comments. if voice works, it's a superior UI than GUI. It's an article from an investment firm that are betting on this. Nothing wrong with that

评论 #43169069 未加载

kennyloginz3 个月前

Can it wreck a nice beach?