TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

StreamVC: Real-Time Low-Latency Voice Conversion

99 pointsby trevett10 months ago

8 comments

coldblues10 months ago
<a href="https:&#x2F;&#x2F;github.com&#x2F;hrnoh24&#x2F;stream-vc">https:&#x2F;&#x2F;github.com&#x2F;hrnoh24&#x2F;stream-vc</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;yuval-reshef&#x2F;StreamVC">https:&#x2F;&#x2F;github.com&#x2F;yuval-reshef&#x2F;StreamVC</a><p>Unofficial implementations of StreamVC
评论 #40944276 未加载
huac10 months ago
The samples were released a while back: <a href="https:&#x2F;&#x2F;google-research.github.io&#x2F;seanet&#x2F;stream_vc&#x2F;" rel="nofollow">https:&#x2F;&#x2F;google-research.github.io&#x2F;seanet&#x2F;stream_vc&#x2F;</a>
评论 #40942713 未加载
评论 #40943420 未加载
judiisis10 months ago
What is the current best Foss(or otherwise) implementation for voice changer&#x2F;anonymiser?
评论 #40942953 未加载
udev409610 months ago
Actual paper: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2401.03078" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2401.03078</a>
manishsharan10 months ago
Are there any use cases that is driving this ? Is there a huge burning need for technology ?<p>Are kidnappers and con-men a huge under-served market that Google is hoping to serve ? Deep Fake videos not convincing enough to serve the need of fraudsters ?<p>I am totally against regulating AI but shit like this gives fodder to the other side.
评论 #40946354 未加载
评论 #40946505 未加载
评论 #40946606 未加载
评论 #40946136 未加载
评论 #40950080 未加载
gnat10 months ago
From the poster:<p>In this work, we propose a light-weight (~20M param.) causal voice conversion solution that can run in real-time with low latency on a commercially available mobile device. The key design elements are: (1) using a causal encoder to learn soft speech units; (2) injecting whitened f0 to improve pitch stability without leaking source speaker info.<p>In our later V2 version, we found that f0 rescaling followed by a NSF-style harmonic-plus-noise conditioning (as is done in RVC) results in better quality.
评论 #40942672 未加载
froglus10 months ago
is it like discord or just voice chat, because i like to have things twice!!
neilk10 months ago
What are the anticipated use cases?<p>I know of one: transgender people often would like to alter the timbre of their voice and spend a lot of time training their voice. At least for online scenarios, this can just do it.<p>But other than that AI voice altering research seems like it benefits mostly scammers? I’m just wondering what they tell themselves they’re doing. I didn’t see this in the paper.
评论 #40947352 未加载
评论 #40945311 未加载
评论 #40945661 未加载
评论 #40948052 未加载
评论 #40945796 未加载
评论 #40954081 未加载
评论 #40946303 未加载
评论 #40945748 未加载