346 点作者 kwindla大约 1 年前

I've been obsessed for the past ~year with the possibilities of talking to LLMs. I built a bunch of one-off prototypes, shared code on X, started a Meetup group in SF, and co-hosted a big hackathon. It turns out that there are a few low-level problems that everybody building conversational/real-time AI needs to solve on the way to building/shipping something that works well: low-latency media transport, echo cancellation, voice activity detection, phrase endpointing, pipelining data between models/services, handling voice interruptions, swapping out different models/services.On the theory that something like a LlamaIndex or LangChain for real-time/conversational AI would be useful, a few of us started working on a Python library for voice (and multimodal) AI assistants/agents.So ... Pipecat: a framework for building things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, virtual friends, and snarky social bots.Most of the core contributors to Pipecat so far work together at our day jobs. This has been a kind of "20% time" thing at our company. But we're serious about welcoming all contributions. We want Pipecat to support any and all models, services, transport layers, and infrastructure tooling. If you're interested in this stuff, please check it out and let us know what you think. Submit PRs. Become a maintainer. Join the Discord. Post cool stuff. Post funny stuff when your voice agent goes completely off the rails (as mine sometimes do).

10 条评论

awenix大约 1 年前

Nice to see an open source implementation, i have been seeing many startups get into this space like <a href="https://www.retellai.com/">https://www.retellai.com/</a>, <a href="https://fixie.ai/" rel="nofollow">https://fixie.ai/</a> etc. They always end up needing speech-to-speech models (current approach seems speech-text-text-speech with multiple agents handling 1 listening + 1 speaking), excited to see how this plays with recently announced gpt-4o

评论 #40347672 未加载

评论 #40348900 未加载

评论 #40350643 未加载

ilaksh大约 1 年前

This is great but we really need an audio-to-audio model like they demoed in the open source world. Does anyone know of anything like that?Edit: someone found one: <a href="https://news.ycombinator.com/item?id=40346992">https://news.ycombinator.com/item?id=40346992</a>

评论 #40346494 未加载

评论 #40346532 未加载

johnmaguire大约 1 年前

Siri came out in October 2011. Amazon Alexa made its debut in November 2014. Google Assistant's voice-activated speakers were released in May 2016.From what I can tell, Siri is still a dumpster fire that nobody is willing to use. And I have no personal experience with Alexa, so I can't speak to it. But I do have a few Google Home speakers and an Android phone, and I have seen no major improvements in years. In fact, it has gotten worse - for example, you can no longer add items directly to AnyList[0], only Google Keep.Or, as an incredibly simple example of something I thought we'd get a long time ago, it's still unable to interpret two-part requests, e.g. "please repeat that but louder," or "please turn off the kitchen and dining room lights."I find voice assistants very useful - especially when driving, lying in bed, cooking, or when I'm otherwise preoccupied. Yet they have stagnated almost since their debut. I can only imagine nobody has found a viable way to monetize them.What will it take to get a better voice assistant for consumers? Willow[1] doesn't seem to have taken off.[0] <a href="https://help.anylist.com/articles/google-assistant-overview/" rel="nofollow">https://help.anylist.com/articles/google-assistant-overview/</a>[1] <a href="https://heywillow.io/" rel="nofollow">https://heywillow.io/</a>edit: I realize I hijacked your thread to dump something that's been on my mind lately. Pipecat looks really cool, and I hope it takes off! I hope to get some time to experiment this weekend.

评论 #40347515 未加载

评论 #40346996 未加载

评论 #40355386 未加载

评论 #40346576 未加载

评论 #40349345 未加载

评论 #40357768 未加载

评论 #40347576 未加载

userhacker大约 1 年前

Just made <a href="https://feycher.com" rel="nofollow">https://feycher.com</a> thats similar, but has realtime lip syncing as well. Let me know if you are interested and we can chat

xan_ps007大约 1 年前

We're also building bolna an open source voice orchestration: <a href="https://github.com/bolna-ai/bolna">https://github.com/bolna-ai/bolna</a>

russ大约 1 年前

LiveKit Agents, which OpenAI uses in voice mode is also open source:<a href="https://github.com/livekit/agents">https://github.com/livekit/agents</a>

orliesaurus大约 1 年前

The whole VAD thing is very interesting, keen to learn more about how it works and especially with multiple speakers!

canadiantim大约 1 年前

Very cool, great work! I can def self using this when I start building in that direction.

35mm大约 1 年前

How would I go about using this to live translate phone calls?

评论 #40368487 未加载

评论 #40356012 未加载

bamazizi大约 1 年前

I wonder how the just announced "GPT-4o" with real-time voice impacts projects like this?The demo on real-time multi language translation conversation blew me away!

评论 #40346654 未加载

评论 #40346599 未加载

评论 #40346837 未加载

Show HN: An open source framework for voice assistants