TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: An open source framework for voice assistants

346 点作者 kwindla大约 1 年前
I&#x27;ve been obsessed for the past ~year with the possibilities of talking to LLMs. I built a bunch of one-off prototypes, shared code on X, started a Meetup group in SF, and co-hosted a big hackathon. It turns out that there are a few low-level problems that everybody building conversational&#x2F;real-time AI needs to solve on the way to building&#x2F;shipping something that works well: low-latency media transport, echo cancellation, voice activity detection, phrase endpointing, pipelining data between models&#x2F;services, handling voice interruptions, swapping out different models&#x2F;services.<p>On the theory that something like a LlamaIndex or LangChain for real-time&#x2F;conversational AI would be useful, a few of us started working on a Python library for voice (and multimodal) AI assistants&#x2F;agents.<p>So ... Pipecat: a framework for building things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, virtual friends, and snarky social bots.<p>Most of the core contributors to Pipecat so far work together at our day jobs. This has been a kind of &quot;20% time&quot; thing at our company. But we&#x27;re serious about welcoming all contributions. We want Pipecat to support any and all models, services, transport layers, and infrastructure tooling. If you&#x27;re interested in this stuff, please check it out and let us know what you think. Submit PRs. Become a maintainer. Join the Discord. Post cool stuff. Post funny stuff when your voice agent goes completely off the rails (as mine sometimes do).

10 条评论

awenix大约 1 年前
Nice to see an open source implementation, i have been seeing many startups get into this space like <a href="https:&#x2F;&#x2F;www.retellai.com&#x2F;">https:&#x2F;&#x2F;www.retellai.com&#x2F;</a>, <a href="https:&#x2F;&#x2F;fixie.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;fixie.ai&#x2F;</a> etc. They always end up needing speech-to-speech models (current approach seems speech-text-text-speech with multiple agents handling 1 listening + 1 speaking), excited to see how this plays with recently announced gpt-4o
评论 #40347672 未加载
评论 #40348900 未加载
评论 #40350643 未加载
ilaksh大约 1 年前
This is great but we really need an audio-to-audio model like they demoed in the open source world. Does anyone know of anything like that?<p>Edit: someone found one: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40346992">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40346992</a>
评论 #40346494 未加载
评论 #40346532 未加载
johnmaguire大约 1 年前
Siri came out in October 2011. Amazon Alexa made its debut in November 2014. Google Assistant&#x27;s voice-activated speakers were released in May 2016.<p>From what I can tell, Siri is still a dumpster fire that nobody is willing to use. And I have no personal experience with Alexa, so I can&#x27;t speak to it. But I do have a few Google Home speakers and an Android phone, and I have seen no major improvements in years. In fact, it has gotten worse - for example, you can no longer add items directly to AnyList[0], only Google Keep.<p>Or, as an incredibly simple example of something I thought we&#x27;d get a long time ago, it&#x27;s still unable to interpret two-part requests, e.g. &quot;please repeat that but louder,&quot; or &quot;please turn off the kitchen and dining room lights.&quot;<p>I find voice assistants very useful - especially when driving, lying in bed, cooking, or when I&#x27;m otherwise preoccupied. Yet they have stagnated almost since their debut. I can only imagine nobody has found a viable way to monetize them.<p>What will it take to get a better voice assistant for consumers? Willow[1] doesn&#x27;t seem to have taken off.<p>[0] <a href="https:&#x2F;&#x2F;help.anylist.com&#x2F;articles&#x2F;google-assistant-overview&#x2F;" rel="nofollow">https:&#x2F;&#x2F;help.anylist.com&#x2F;articles&#x2F;google-assistant-overview&#x2F;</a><p>[1] <a href="https:&#x2F;&#x2F;heywillow.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;heywillow.io&#x2F;</a><p>edit: I realize I hijacked your thread to dump something that&#x27;s been on my mind lately. Pipecat looks really cool, and I hope it takes off! I hope to get some time to experiment this weekend.
评论 #40347515 未加载
评论 #40346996 未加载
评论 #40355386 未加载
评论 #40346576 未加载
评论 #40349345 未加载
评论 #40357768 未加载
评论 #40347576 未加载
userhacker大约 1 年前
Just made <a href="https:&#x2F;&#x2F;feycher.com" rel="nofollow">https:&#x2F;&#x2F;feycher.com</a> thats similar, but has realtime lip syncing as well. Let me know if you are interested and we can chat
xan_ps007大约 1 年前
We&#x27;re also building bolna an open source voice orchestration: <a href="https:&#x2F;&#x2F;github.com&#x2F;bolna-ai&#x2F;bolna">https:&#x2F;&#x2F;github.com&#x2F;bolna-ai&#x2F;bolna</a>
russ大约 1 年前
LiveKit Agents, which OpenAI uses in voice mode is also open source:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;livekit&#x2F;agents">https:&#x2F;&#x2F;github.com&#x2F;livekit&#x2F;agents</a>
orliesaurus大约 1 年前
The whole VAD thing is very interesting, keen to learn more about how it works and especially with multiple speakers!
canadiantim大约 1 年前
Very cool, great work! I can def self using this when I start building in that direction.
35mm大约 1 年前
How would I go about using this to live translate phone calls?
评论 #40368487 未加载
评论 #40356012 未加载
bamazizi大约 1 年前
I wonder how the just announced &quot;GPT-4o&quot; with real-time voice impacts projects like this?<p>The demo on real-time multi language translation conversation blew me away!
评论 #40346654 未加载
评论 #40346599 未加载
评论 #40346837 未加载