TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Built a Real Time Visual Intelligence

2 点作者 Aeroi3 个月前
I built a realtime visual intelligence that connects a users phone camera to a multimodal llm. I use the pipecat open source framework, webrtc, and a few other services to connect it all together.<p>It&#x27;s similar to chatgpt advanced voice and grounded with google_search for asynch internet searches based on transcripts or frames from the video that run at 1fps to the LLM.<p>Let me know what you think and if you want to work on some fun scaling problems with me on this project.<p>www.withsen.com

1 comment

Aeroi3 个月前
One interesting note with voice AI is that you can shove static datasets into the long context windows of these newer models like 2.0-flash-lite. It creates a Model Assisted Generation(MAG) and returns super low latency and 99% relevant information to the bot. Theres a good example in the foundational example of the pipecat github.