科技回声

Hi guys,After some research and no luck finding anyone that seems to be working on this, I thought I'd try a Hail Mary and post on here.I'm looking to speak to anyone who is working on speech-to-video (real-time speech rendering). We already have software which can take audio (speech) input and render a video which resembles a person or avatar speaking, but it takes a long time to render.How long will it be before the video of the person/avatar speaking will be renderable in near real-time, with similar latency to existing speech-to-text models?What would the prototype look like to reduce the latency? Is anyone working on anything like this?For context, I run a language learning app where you can practice speaking orally with AI. It would be far more engaging if the user had an avatar/person to be able to speak to, rather than staring at the chat history whilst talking to the AI conversation partner.Thanks, ChrisFor context, here's the original post: https://news.ycombinator.com/item?id=36973400

1 comment

billconan超过 1 年前

this ?<a href="https://www.heygen.com/article/unleashing-the-power-of-realtime-avatars" rel="nofollow">https://www.heygen.com/article/unleashing-the-power-of-realt...</a><a href="https://docs.trypromptly.com/guides/realtime-avatar-with-rag" rel="nofollow">https://docs.trypromptly.com/guides/realtime-avatar-with-rag</a>

评论 #39197565 未加载

Speech-to-video synthesis: Real-time rendering of speech

1 comment

Speech-to-video synthesis: Real-time rendering of speech

1 comment