Hi HN!<p>We are the co-founders of Lyrebird (<a href="https://lyrebird.ai/" rel="nofollow">https://lyrebird.ai/</a>) and PhD students in AI at University of Montreal. We are building speech synthesis technologies to improve the way we communicate with computers. Right now, our key innovation is that we can copy the voice of someone else and make it say anything. The tech is still at its early stage but we believe that it is eventually going to make possible a wide range of new applications such as:<p>- reading loud text messages with the voice of the sender,<p>- reading audiobooks with the voice of your choice,<p>- giving a personalized digital voice to people who lost their voice due to a disease,<p>- allowing video game makers to have more customized dialogs generated on the fly, or avatars of their players,<p>- allowing movie makers to freeze the voice of their actors so that they can still use it if the actor ages or dies.<p>Yesterday we launched a beta version of our voice-cloning software: anyone can record one minute of audio and get a digital voice that sounds like them.<p>We know that many on HN are concerned about potential misuses surrounding these technologies and we share your concern. We write further on our ethical stance on this page: <a href="https://lyrebird.ai/ethics/" rel="nofollow">https://lyrebird.ai/ethics/</a>.<p>Our blogpost about the launch: <a href="https://lyrebird.ai/blog/create-your-voice-avatar" rel="nofollow">https://lyrebird.ai/blog/create-your-voice-avatar</a> that features the first video combining generated audio and generated elements of the video.<p>There was a thread about us on HN when we launched our website four months ago (<a href="https://news.ycombinator.com/item?id=14182262" rel="nofollow">https://news.ycombinator.com/item?id=14182262</a>) but at that time, no one could test our software yet and we did not really answer any question of the community. So this time we are ready for questions and would love some feedback!
This looks really great, congrats! Forgive me if I missed something, but I was wondering if you could clear up some confusion. From the terms: "Subject to the Biometric Data Agreement, you hereby grant to us a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive and fully sublicensable right (including any moral rights) and license to use, license, distribute, reproduce, modify, adapt, publicly perform, and publicly display Your Voice, Digital Voice..."<p>Just to be clear, the license of the voice/digital voice is revoked upon deletion of the recordings? I understand it is subject to the biometric agreement, but the words perpetual and irrevocable still worried me. Thanks!
Sounds amazing! Just to add a usecase - for many people, creating a decent voiceover is one of the big sticking points for producing youtube videos or educational courses. If I could write a script, and have software generate a decent enough voiceover, it would be amazing.<p>It's not even necessary to copy anyone's voice, as long as there's a selection of the most comprehensible and human-sounding ones.<p>Then, you could even automatically generate slideshow presentation from a few illustrations and headlines, and that would make "rendering" articles into videos very fast and easy. I'm sure a lot of people would pay for such service.<p>----<p>By the way, recently I've encountered Deep Voice 2, a similar research project by baidu:<p><a href="http://research.baidu.com/deep-voice-2-multi-speaker-neural-text-speech/" rel="nofollow">http://research.baidu.com/deep-voice-2-multi-speaker-neural-...</a><p>Results are very impressive.
While it's good that you have an ethics page: <a href="https://lyrebird.ai/ethics/" rel="nofollow">https://lyrebird.ai/ethics/</a>, it only has two ethical guidelines:<p>* Spread awareness of this technology<p>* Your digital voice remains yours<p>I would feel a lot better about this if you also had explicit ethical boundaries, for example disallowing users impersonating someone else, e.g. Donald Trump, Barack Obama. "Your digital voice remains yours" sort of sounds like you won't use/share my digital voice with others, but doesn't directly address whether bad actors can maliciously impersonate someone who hasn't registered with Lyrebird.
Their ethics don't seem to be something they take seriously as the video they use to promote their own site is an impersonation itself.<p>Seems from right out of the gate, they are breaking their own ethical guidelines as a cheap promotional tactic. If they care that little about themselves and a former president of the United States, what do they care about your likeness.<p>It also doesn't help that you give them a universal perpetual license to do whatever they want (including selling your likeness for someone else's use) by uploading.<p>This just seems like a slimy team that put up an ethics page as a CYA.<p>I'm willing to eat my words if they had Barak Obama's consent to use his digitized voice for this but, it's highly doubtful since there's also the coat and seal of the President of the United States on the flag in the background which would be a massive ethical breach of a former President just to promote a silly little startup.
"I'm using my voice as my password".<p>Vanguard allows voice authentication (<a href="https://investor.vanguard.com/account-conveniences/voice-verification?lang=en" rel="nofollow">https://investor.vanguard.com/account-conveniences/voice-ver...</a>) - and who knows who else will roll something similar out in the future. Yeah, its really really dumb, but it's happening in production now. I wouldn't use this product if I were you, but honestly you should also not use voice verification/authentication for anything.
Seems like a really useful piece of technology. As you said, it's got quite a few applications in the gaming, film, medical and messaging industries.<p>That said, am I the only one imagining this getting abused by people in those fields as well? Seems like a good way to avoid paying voice actors for future work. Just record the minimum 30 recordings, then use this software to create all their future dialogue.<p>This could lead to some interesting lawsuits over who a character's voice belongs to and whether a company has the right to use someone's voice recordings to get free work done on future projects. Like how during the production of Trail of the Pink Panther, Peter Sellers' widow sued the film's producers and studio over them using clips of him from deleted scenes in earlier films in the movie.
I don't buy the "raising awareness" argument, ethically speaking. To do that, you could release demo files that show the capability without weaponizing it through easy access. It'd be great to increase awareness around our vulnerability to EMP attacks, but we don't need to publish specs and or sell a working prototype to make that case.<p>This is just one of those areas where the negative implications, I believe, far outweigh the positive ones. Aside from the noble cause of helping the disabled, most of the use cases center around entertainment. As great as that may be, the likely application to fraud and the potential for a catastrophic misuse in matters of war and peace just dwarf any upside.
Will this technology be licensed for redistribution or only for online API use? I ask because in the video game scenario it would be great to have this in a library I could distribute instead of relying on the API to be available at all times.
Really fun stuff. I noticed that it seems to have problems starting sentences. Especially if I try to start a sentence with "hi,". Interesting nonetheless. This passage seems to be rendered fairly well: <a href="https://lyrebird.ai/g/LYoVuaZm" rel="nofollow">https://lyrebird.ai/g/LYoVuaZm</a><p>Also, <a href="https://lyrebird.ai/g/D3Fw328D" rel="nofollow">https://lyrebird.ai/g/D3Fw328D</a>
I guess I see a ton of upside here, but I also see that this could easily be abused and possibly a tool to completely destroy someones life. Imagine getting a phone call from your "partner" saying they cheated on you. I dont know how it would be useable(api?) and I do still detect a bit of artificialness to to voice, but as this gets better I worry about the down sides and potential for harm by copying someones voice.
I just tried to signup with a Hotmail email address and I got this error message: <i>This email cannot be used to create an account. It might be due to your email domain name.</i><p>I realize Hotmail isn't the sexiest email provider these days but it's one of the more commonly used. Do you have a list of email domains you allow?
I assume you guys know about VocalID that got an NSF SBiR grant for giving mute people a voice (through similar means) <a href="https://www.vocalid.co/" rel="nofollow">https://www.vocalid.co/</a>
This is incredible - recorded my voice and I'm blown away with the results.<p>One thing: I found that I was in such a hurry to record that I probably spoke faster than normal. It'd be nice if there was a way to tune a few parameters manually (tempo, pitch, etc).<p>If I ever lose my voice and have to have a TTS appliance speak for me, I'll be contacting you all to get my voice profile!<p>EDIT: For those interested, pretty impressive that it figured out the appropriate cadence for this: <a href="https://lyrebird.ai/g/v7MpYaUA" rel="nofollow">https://lyrebird.ai/g/v7MpYaUA</a>
This looks awesome. I commented on the original post about how exciting this is for worldbuilding (and creating realistic voices for fictional characters, with all the uses that come there).<p>Random question: it's said that people think their own voices sound weird when they hear recordings of themselves played back. Do you have a way to measure that phenomenon? Have you seen people complaining about the accuracy when in fact it was just that effect making people sound "weird" (to themselves)?
This is only tangentially on topic, but is there an API or some engine that I can feed short sentences into and get high-quality generation back?<p>I have an RC controller radio that supports voice prompts, and I would like to add some short phrases that are missing, such as "air mode on", "throttle warning", etc.<p>Is there anything on par in quality with Google's/Siri's voice? Not the Google TTS, but the voice they use in Now.
Amazing - I cant wait to integrate this with our VR product. We previously used Amazon Polly attached to a chatbot:
<a href="https://twitter.com/Alientrap/status/829032930626383873" rel="nofollow">https://twitter.com/Alientrap/status/829032930626383873</a><p>First uses that come to mind are players adding themselves to a VR world - or maybe celebrities / public figures.
Congrats on the launch! The tech is amazing<p>Quick q's (purely out of curiosity):<p>1) > We are [...] PhD students in AI at University of Montreal<p>Are you doing the startup on the side/planning on going back to school?<p>2) I don't recall reading about you guys in articles about YC S17 demo days. What are reasons why some companies might not participate in demo day or remain off-the-record? In your case, you seem to have had a working product long before demo day
This is probably going to be great, but I just tested out voice generation with the bare minimum of 30 recordings, and it really fell flat. When I tried playback with an input, all it could produce was a high-pitched buzzing sound and then maybe 1/4 of the words I typed in, which sounded nothing like me.<p>Perhaps you should increase the minimum from 30 recordings to 100?
When I try test my digital voice, after clicking "Generate," I get this error after about 10 seconds:<p>Something went wrong. Please try again!<p>I've tried about 5 times.<p>EDIT: I went to back to the page a few minutes later, and the recordings were all there. So it looks like it works, but is giving a false error message.
I have a youtube channel (vimgirl) and before recording I have to write scripts for what I plan to say in the video. The digital voice doesn't seem to be working right now, but when it does it would cut down my screencast production time by at least half.
Cool stuff! Question from your FAQ:<p>> Q: Will I be able to copy another person's voice?<p>> A: Yes but only if you have the authorization of the person whose voice is being copied.<p>Perhaps you can unpack that answer a bit? What's the authorization process?
Hey.. how does lyrebird handle accent? I work in education space and due to accent of people in my country, the content doesnt work well with global audience.<p>are you open for beta? would like to try out your api on education content.
...make possible a wide range of new applications such as<p>- hacking voice-controlled interfaces<p>- generating fake news<p>FTFY<p>don't @ me saying "sure any technology can be used for good and bad stop being a ludite" yeah I know that just messing with you