TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Lyrebird – An API to copy the voice of anyone

1401 pointsby adbrebsabout 8 years ago

75 comments

eadzabout 8 years ago
Combined with Face2Face[1] live video impersonation, it is truly time to be very careful verifying videos or even live streams.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ohmajJTcpNk" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ohmajJTcpNk</a>
评论 #14187331 未加载
评论 #14187431 未加载
评论 #14184376 未加载
评论 #14184653 未加载
评论 #14184307 未加载
评论 #14184175 未加载
pbhjpbhjabout 8 years ago
Last week on BBC Radio 4 I heard of a woman who was losing her voice through disease (MND maybe?), a similar system was being anticipated and she was saving voice samples to seed it with.<p>She had been a singer and strongly identified her self with her voice, she wanted to be able to use a speech synthesis system that had her own voice pattern.<p>Apologies if this was already mentioned, but it seems to be a use others here hadn&#x27;t considered.
评论 #14184696 未加载
评论 #14184352 未加载
评论 #14186308 未加载
评论 #14184298 未加载
评论 #14184089 未加载
qeternityabout 8 years ago
While all of these vec2speech type models are impressive, I get the feeling that most of the comments didn&#x27;t listen to any of the samples. It&#x27;s still distinctly robotic sounding, probably has quite a bit of garbage output that needs to be filtered manually (as many of these nets often have) and is a far cry from fooling a human.
评论 #14188231 未加载
评论 #14183471 未加载
评论 #14182850 未加载
评论 #14184197 未加载
评论 #14182875 未加载
评论 #14183000 未加载
评论 #14182762 未加载
评论 #14183693 未加载
评论 #14187312 未加载
评论 #14183661 未加载
评论 #14182868 未加载
paraschopraabout 8 years ago
I appreciate the ethics link up there in the menu. Not sure if I noticed it on any other AI startup (or for that matter, any startup). Given how complex the world is becoming due to ever increasing co-dependence with tech, I can see how such pages could become as important as &#x27;pricing&#x27; or &#x27;sign up&#x27; pages. (The privacy issues with Unroll.me, Uber and a thousand other such services will only accelerate this trend).<p>Good job, team Lyrebird. My feedback is that while the inclusion of ethics page is great, it could do with more content on your vision and what you will not let your tech be used for. I know others can develop similar tech, but it will be good to read about YOUR ethics.<p>[Edited for clarity]
评论 #14184695 未加载
评论 #14184177 未加载
keithwhorabout 8 years ago
I love this. The business model is too good to be true.<p>1. Open source voice-copying software<p>2. At worst, create entire market of voice-fraudsters, at best, very few voice-fraudsters but very high and very real perception of fear of such<p>3. Become leading security experts in voice fraud detection<p>4. Sell software &#x2F; time &#x2F; services to intelligence agencies, governments, law enforcement, news networks<p>Ethically I&#x27;m a bit concerned with (2), but realistically the team is right --- this technology exists, it will certainly be used for good and for bad, and they&#x27;re positioning themselves as the leading experts.<p>I&#x27;m interested to see which VCs and acquirers line up here. Applying a voice to any phrase seems useful for voice assistants (Amazon Alexa, Google Home) but I don&#x27;t think that&#x27;s the $B model.
评论 #14186412 未加载
评论 #14187157 未加载
pinpeliponniabout 8 years ago
Funny thing is, this is approximately where CIA was with similar technology in closer to 2000. They did some demos for politicians about how they can given anyone&#x27;s fake their messages. That stuff is golden for propaganda means, and for confusing stuff like military chains of command. Today the CIA probably has worked out all the robotic artifacts already, and their output is really indistinguishable.
评论 #14184770 未加载
评论 #14184532 未加载
评论 #14184800 未加载
yladizabout 8 years ago
This is pretty cool (although, I have no idea what other technologies exist for this kind of thing), but it&#x27;s definitely not convincing enough to a human listener. This sounds like it might be convincing enough for some programs like &quot;Hey, Siri&quot; but it&#x27;s not gonna convince your mom. You can listen to the samples on the page linked here and you can immediately tell that Obama and Trump don&#x27;t sound quite human.
评论 #14182576 未加载
评论 #14182593 未加载
评论 #14182537 未加载
评论 #14182480 未加载
LegendaryPatManabout 8 years ago
This is pretty basic at the moment and it&#x27;s terrifying. Yeah, it has an MS Sam feel to it, but as the tech improves and we know it will, you could use a service like this to put words in someone&#x27;s mouth. Think about how you could trip up a CEO or a Politician by playing some random clip that they never said. When that gets into the Zeitgeist judgments will be made in the court of public opinion devoid of facts or real evidence. You could destroy democracy or people&#x27;s lives with technology like this
评论 #14187310 未加载
评论 #14184271 未加载
评论 #14184742 未加载
got2surfabout 8 years ago
This is exciting! If you look at historic speeches (ie from American Rhetoric <a href="http:&#x2F;&#x2F;www.americanrhetoric.com&#x2F;top100speechesall.html" rel="nofollow">http:&#x2F;&#x2F;www.americanrhetoric.com&#x2F;top100speechesall.html</a>), there are large variations in average characteristics between various styles&#x2F;contexts (on average, pitch&#x2F;volume&#x2F;speed are different for inspirational vs somber speeches, for example). But there are also really large differences in the variation - an inspirational speech may be marked by large swings from quiet, reflective pieces to booming, rousing calls-to-action while a somber speech has fewer swings in delivery.<p>For the examples given for various intonations from Obama&#x2F;Trump, some intonations are much more natural than others. It would be interesting to decide how to parametrize a sentence for the intended intonation. (based on word2vec analysis of the words in the sentence, punctuation cues in the sentence, and perhaps a specified category of &quot;emotional delivery&quot;).<p>It would be interesting at the sentence-level, but also at the macro speech-level to include the right &quot;mix&quot; of intonations for a specific context. On a related note, it would be interesting to study the patterns of intonations in successful vs unsuccessful outbound sales calls, for example, to learn how to best simulate a good human sales voice.
amarantabout 8 years ago
It&#x27;s there any copyright protections for a person&#x27;s voice? If not, David Attenborough and Morgan Freeman will be lead voice actors in my next game project
评论 #14182770 未加载
评论 #14182682 未加载
评论 #14182860 未加载
评论 #14182851 未加载
评论 #14182699 未加载
评论 #14188350 未加载
评论 #14182584 未加载
评论 #14182861 未加载
评论 #14185880 未加载
评论 #14182625 未加载
评论 #14182972 未加载
评论 #14182719 未加载
评论 #14182708 未加载
epsabout 8 years ago
Impressive.<p>But also enabling the next gen of &quot;Mom, I&#x27;m in Mexican jail. Quickly wire me $2,000 so I can get out.&quot; scams.
评论 #14182568 未加载
评论 #14182690 未加载
评论 #14182611 未加载
评论 #14183702 未加载
评论 #14182855 未加载
评论 #14183382 未加载
评论 #14182647 未加载
评论 #14182589 未加载
celticninjaabout 8 years ago
Is this enough to beat voice recognition software?<p>If you thought fake news was bad before wait until these &#x27;secret&#x27; recordings start getting released and reported on.
评论 #14182790 未加载
评论 #14182439 未加载
评论 #14183012 未加载
JustFinishedBSGabout 8 years ago
Cooler : <a href="http:&#x2F;&#x2F;www.dtic.upf.edu&#x2F;~mblaauw&#x2F;IS2017_NPSS&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.dtic.upf.edu&#x2F;~mblaauw&#x2F;IS2017_NPSS&#x2F;</a><p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1704.03809" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1704.03809</a>
评论 #14184024 未加载
评论 #14184768 未加载
评论 #14184377 未加载
评论 #14182681 未加载
joshmarlowabout 8 years ago
Finally, I can have Morgan Freeman narrate my major life events.<p>Update: Reading changelogs before deployment never sounded better!
评论 #14188320 未加载
cjlarsabout 8 years ago
I was wondering when CG Sir David Attenborough would get here and start narrating my day to day.
评论 #14183168 未加载
评论 #14182734 未加载
sna1labout 8 years ago
Charles Schwab uses a voice phrase to authenticate you for access to your account, which is already pretty brittle, but I hope this makes them reconsider more urgently.
ksecabout 8 years ago
1. Is this company new?<p>2. Is this better then what Google or Baidu are doing?<p>3. I remember reading Adobe has something similar.<p>4. Why ( What happened ) that all of a sudden we have 4 company making voice breakthrough tech like these?<p>5. What Happen to Voice Acting? Places like Japan where they highly value voice actor. Is Voice even patentable?
评论 #14185770 未加载
评论 #14186677 未加载
评论 #14186696 未加载
Nadyaabout 8 years ago
I see a lot of people claiming that certain things will now be untrustworthy.<p>As if <i>human</i> voice imitators have not existed and could not be paid for prior to this. For $5 you can get Stewie Griffin [0] or Barack Obama [1] to say whatever you want them to say. Any audio-only messages of well known figures should already be considered &quot;compromised&quot; and untrustworthy. Even without the technology to impersonate them.<p>This should be more concerning for &quot;normal people&quot;. It isn&#x27;t that you can no longer trust an audio-only recording of Obama, but that you may not longer be certain an audio recording is from your best friend. (E: Once the technology improves a bit more of course.)<p>[0] <a href="https:&#x2F;&#x2F;www.fiverr.com&#x2F;joe_stevens&#x2F;talk-like-stewie-griffin-for-you" rel="nofollow">https:&#x2F;&#x2F;www.fiverr.com&#x2F;joe_stevens&#x2F;talk-like-stewie-griffin-...</a><p>[1] <a href="https:&#x2F;&#x2F;www.fiverr.com&#x2F;celebimpression&#x2F;do-a-custom-barack-obama-impersonation" rel="nofollow">https:&#x2F;&#x2F;www.fiverr.com&#x2F;celebimpression&#x2F;do-a-custom-barack-ob...</a>
drusepthabout 8 years ago
This is awesome. As someone exploring the fictional storytelling space, this seems like it&#x27;d have a lot of fun applications in that space as well.<p>How difficult is it to create&#x2F;tune voices from parameters rather than training from an audio clip? I build software where people create fictional characters for writing, and having an author &quot;create&quot; voices for each character would be an amazing way to autogenerate audiobooks with their voices, or interact with those characters by voice, or just hear things written from their point of view in their voice for that extra immersion. Having an author upload voice clips of themselves mimicking what they think that character should sound like, but probably would keep traces of their original voice (and feel &quot;fake&quot; to them because they can recognize their own voice), no?<p>Can&#x27;t wait to see how this pans out. Signed up for the beta and will definitely be pushing it to its limits when it&#x27;s ready. :)
carlobabout 8 years ago
I wonder how dependent this is on language: can we make Trump speak Chinese using a one minute audio track of him speaking English?
评论 #14182777 未加载
echelonabout 8 years ago
It sounds like they&#x27;re training a parametric speech synthesis platform on samples in order to learn the parameters. I wonder if there are are approaches at generating n-phones for concatenative models, or using a hybrid approach.<p>I built a toy concatenative Donald Trump speech system [1], but I don&#x27;t have an ML background. I&#x27;ve been taking Andrew Ng&#x27;s online course in addition to Udacity&#x27;s deep learning program in an attempt to learn the basics. I&#x27;m hoping I can use my dataset to build something backed by ML that sounds better.<p>Is anyone in the Atlanta area interested in ML? I&#x27;d love to chat over coffee or join local ML interest groups.<p>[1] <a href="http:&#x2F;&#x2F;jungle.horse" rel="nofollow">http:&#x2F;&#x2F;jungle.horse</a>
评论 #14186940 未加载
Tloewaldabout 8 years ago
This is very exciting to me because it lets RPGs provide spoken dialog for everything (I&#x27;m waiting to see if they can do emotions at all convincingly). Even big budget games suffer from &quot;you can call your character anything as long as it&#x27;s &#x27;Shepherd&#x27;&quot; simply because you can&#x27;t mention the character&#x27;s name or any other use-content safely.
retoxabout 8 years ago
Through the tinny speaker of my mobile phone the Obama in the first sample is almost spot on. Some speed issues with Trump but really impressive.
joeblauabout 8 years ago
I wonder how accurately this would reproduce dead musicians voices. I&#x27;ve had this idea for about 8 years called the Notorious BIG project. I have about 20 acapellas that I was originally going to manually chop into a song. Neural Nets can pretty much solve this now.
jtbaylyabout 8 years ago
Can we get these speeches in audio form now?<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;@samim&#x2F;obama-rnn-machine-generated-political-speeches-c8abd18a2ea0" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@samim&#x2F;obama-rnn-machine-generated-politi...</a>
kristapsabout 8 years ago
As noted in other comments, all the samples still sound very robotic, so this is probably &quot;just&quot; a method to tune the parameters of an existing voice synthesizer to mimic a real persons voice as much as it allows.
评论 #14184401 未加载
Ensorceledabout 8 years ago
The samples all sound a little like Rich Little and Stephen Hawking&#x27;s love child doing impressions: they won&#x27;t fool very many people.<p>But, you can certainly see where this is going and that&#x27;s the worrisome part.
评论 #14184786 未加载
ageofwantabout 8 years ago
Oh yea. The Troll embedded deep in my soul giggles in glee.<p>However, the day some shill tries to sell me travel insurance in departed nana&#x27;s voice would be the day I start signing my voice convos&#x27; with a pgp key.
felipemesquitaabout 8 years ago
This site has a &quot;demo&quot; section featuring only Soundcloud clips. Uses to much the present tense &quot;In a world first, Montreal-based startup Lyrebird today unveiled&quot; and &quot;Record 1 minute [...] and Lyrebird can [..]Use this key to generate anything&quot; but has no actual product or beta version. Adobe had a much more impressive sneak peek of a similar product called VoCo: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=I3l4XLZ59iw" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=I3l4XLZ59iw</a>
评论 #14183010 未加载
评论 #14182534 未加载
backpropagandaabout 8 years ago
Relevant discussion from 17 hours ago: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=14177589" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=14177589</a>
return0about 8 years ago
We need a new markup language for intonation and emotion.
评论 #14184732 未加载
sciboltabout 8 years ago
Voice Actors out of business! :D
anigbrowlabout 8 years ago
Excellent work. This will find widespread application in the film&#x2F;tv&#x2F;music industry and beyond (and we&#x27;re not that far away from being able to do the same thing for video). Unfortunately it will also be widely abused, but given the near-inevitability of such technological development I&#x27;m already reconciled to that :-&#x2F;
jpsimabout 8 years ago
Curious choice to name a company &amp; product with a name that sounds like &quot;Liar Bird&quot; when spoken. To me, that looks like they&#x27;re fully embracing the concept that this can be used for nefarious purposes. If one of their goals is to bring attention that this technology exists and can be misused, the name reinforces that.
评论 #14188559 未加载
评论 #14187717 未加载
LordKanoabout 8 years ago
This is impressive. There is now a way for Morgan Freeman and James Earl Jones to be able to narrate movies forever.
mericssonabout 8 years ago
Related Economist article: <a href="http:&#x2F;&#x2F;www.economist.com&#x2F;news&#x2F;science-and-technology&#x2F;21721128-you-took-words-right-out-my-mouth-imitating-peoples-speech-patterns" rel="nofollow">http:&#x2F;&#x2F;www.economist.com&#x2F;news&#x2F;science-and-technology&#x2F;2172112...</a>
sehuggabout 8 years ago
Sounds great, I was trying something like this in Keras but didn&#x27;t get very far: <a href="https:&#x2F;&#x2F;github.com&#x2F;sehugg&#x2F;kerasspeechcodec" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sehugg&#x2F;kerasspeechcodec</a>
augusttabout 8 years ago
Any ideas on what the underlying technology looks like? Maybe some kind of GAN for audio...
ParadisoShleeabout 8 years ago
The audio feed sounds like they&#x27;re real and drunk.. so that&#x27;s impressive
bisRepetitaabout 8 years ago
1. Buy the rights for &quot;Car Talk&quot; re-broadcast. 2. Record new, current ads using Click and Clack&#x27;s voices. 3. If the voices sound a little too &quot;mechanic&quot;, pretend it&#x27;s a joke.
dyu-about 8 years ago
This trump version [1] is quite believable. [1] <a href="https:&#x2F;&#x2F;soundcloud.com&#x2F;user-535691776&#x2F;trump-6" rel="nofollow">https:&#x2F;&#x2F;soundcloud.com&#x2F;user-535691776&#x2F;trump-6</a>
haydabout 8 years ago
And just as my bank offers a &quot;login via speaking&quot; option. Lovely.
cocoa19about 8 years ago
This technology reminded me of 24 (TV series).<p>The plot of season 2 has Jack Bauer prove a Cyprus recording between a terrorist and high-ranking Middle East officials was forged so the US president would start a war.
koolbaabout 8 years ago
The President Obama voice sounds decent. But the President Trump and Senator Clinton voices sound like robots. Reminds me of the crappy text to speech program that came with Windows.
vermontdevilabout 8 years ago
Coming soon - fake videos of future political candidates saying outrageous things that will derail their campaigns.<p>Maybe from now on - just learn ASL. Hard to fake a distinctive signing style.
Markoffabout 8 years ago
it&#x27;s interesting development but it sounds too robotic, there is zero intonation&#x2F;punctuation, zero variantions in the voice depending on mood of speaker, etc., in the end extremely robotic and if someone really need to fake someone else voice convincingly it would be still easier to hire professional voice imitator
inetknghtabout 8 years ago
Site doesn&#x27;t load at all on my machine without some javascript from Cloudflare for Ajax.<p>I guess this product isn&#x27;t for me then.
gwbas1cabout 8 years ago
Now we can&#x27;t trust the news anymore. In a year or two we&#x27;ll never know if recordings are real or not.
Sunsetabout 8 years ago
Now make it say the Navyseal copypasta with Trump&#x27;s voice, but make him speak slowly and with emphasis.
abetuskabout 8 years ago
Does anyone know of any free&#x2F;open source alternatives to this? Is it too new to expect a FOSS library?
mzzterabout 8 years ago
Trump 6 speaking &quot;... my intonation is always different&quot; sounds very convincingly human.
w8rbtabout 8 years ago
<i></i><i>“Believe only half of what you see and nothing that you hear.”</i><i></i> -- Edgar Allan Poe
olleromam91about 8 years ago
So all my voice commands can be recorded and my voice can be replicated. Cool...i guess
lekeabout 8 years ago
OMG I want to play with this so bad.
wirddinabout 8 years ago
If they can pull this off with the API, this is worth millions of dollars on the table.
评论 #14185753 未加载
nerfhammerabout 8 years ago
Hello. My name is Werner Brandes. My voice is my passport. Verify me.
rajacombinatorabout 8 years ago
Wow had no idea something like this was possible. Very impressive.
theemathasabout 8 years ago
It&#x27;s a matter of time before this can compete with Vocaloid.
modabout 8 years ago
Does the API get better results with more training data?
weenkusabout 8 years ago
A bit scary thinking someone could do this with ease.
hoodoofabout 8 years ago
It feels like the future has arrived.
simlevesqueabout 8 years ago
Great stuff ! Respect from the 514.
gator-ioabout 8 years ago
So much potential for mischief!!
xumxabout 8 years ago
Be right back (Black Mirror)<p>let&#x27;s do it.
rgloverabout 8 years ago
This is fucking terrifying.
selbekkabout 8 years ago
Scary.
kkotakabout 8 years ago
RIP Dan Castellaneta.
backpropagandaabout 8 years ago
[deleted]
评论 #14182648 未加载
redsummerabout 8 years ago
I wonder if you could do this with singing? Feed it acappela Bowie, Sinatra, Elvis songs, then give it new text, and out comes a similar voice and melody.
redsummerabout 8 years ago
I can&#x27;t wait for Richard Burton to read me the news.
ChairmanPaoabout 8 years ago
Now people can deny saying things caught on tape. Just show this technology to a jury considering taped evidence, and bring in some experts to testify on how it works.<p>The samples weren&#x27;t that convincing to me, but could probably be used to switch a word here and there. That may be enough.
lucidrainsabout 8 years ago
Lol, I totally called this.
amarantabout 8 years ago
They lost me at &quot;... Consumers are still not lining up to buy EV&#x27;s&quot;<p>What the fuck are they talking about?
afinlaysonabout 8 years ago
This is how a lot of tech companies make proper text2speech, this was just done using the vast amount of audio that&#x27;s out there for these people.<p>Soon Trump will use this to state that things he&#x27;s said are fake news. God help us all.
评论 #14182552 未加载
stefek99about 8 years ago
I have two domains:<p>- legalscreenshot.com<p>- legalprintscreen.com<p>I also developed a concept of &quot;Reality Check&quot; similar to Touring Test (when VR and AI becomes so convincing &gt;50% people won&#x27;t distinguish it from base reality)... Too bad I&#x27;m on the corporate network and my personal website is blocked: <a href="https:&#x2F;&#x2F;genesis.re&#x2F;wiki" rel="nofollow">https:&#x2F;&#x2F;genesis.re&#x2F;wiki</a><p>Aside: do you believe psychedelics should be the part of obligatory astronaut training?