Combined with Face2Face[1] live video impersonation, it is truly time to be very careful verifying videos or even live streams.<p><a href="https://www.youtube.com/watch?v=ohmajJTcpNk" rel="nofollow">https://www.youtube.com/watch?v=ohmajJTcpNk</a>
Last week on BBC Radio 4 I heard of a woman who was losing her voice through disease (MND maybe?), a similar system was being anticipated and she was saving voice samples to seed it with.<p>She had been a singer and strongly identified her self with her voice, she wanted to be able to use a speech synthesis system that had her own voice pattern.<p>Apologies if this was already mentioned, but it seems to be a use others here hadn't considered.
While all of these vec2speech type models are impressive, I get the feeling that most of the comments didn't listen to any of the samples. It's still distinctly robotic sounding, probably has quite a bit of garbage output that needs to be filtered manually (as many of these nets often have) and is a far cry from fooling a human.
I appreciate the ethics link up there in the menu. Not sure if I noticed it on any other AI startup (or for that matter, any startup). Given how complex the world is becoming due to ever increasing co-dependence with tech, I can see how such pages could become as important as 'pricing' or 'sign up' pages. (The privacy issues with Unroll.me, Uber and a thousand other such services will only accelerate this trend).<p>Good job, team Lyrebird. My feedback is that while the inclusion of ethics page is great, it could do with more content on your vision and what you will not let your tech be used for. I know others can develop similar tech, but it will be good to read about YOUR ethics.<p>[Edited for clarity]
I love this. The business model is too good to be true.<p>1. Open source voice-copying software<p>2. At worst, create entire market of voice-fraudsters, at best, very few voice-fraudsters but very high and very real perception of fear of such<p>3. Become leading security experts in voice fraud detection<p>4. Sell software / time / services to intelligence agencies, governments, law enforcement, news networks<p>Ethically I'm a bit concerned with (2), but realistically the team is right --- this technology exists, it will certainly be used for good and for bad, and they're positioning themselves as the leading experts.<p>I'm interested to see which VCs and acquirers line up here. Applying a voice to any phrase seems useful for voice assistants (Amazon Alexa, Google Home) but I don't think that's the $B model.
Funny thing is, this is approximately where CIA was with similar technology in closer to 2000. They did some demos for politicians about how they can given anyone's fake their messages. That stuff is golden for propaganda means, and for confusing stuff like military chains of command. Today the CIA probably has worked out all the robotic artifacts already, and their output is really indistinguishable.
This is pretty cool (although, I have no idea what other technologies exist for this kind of thing), but it's definitely not convincing enough to a human listener. This sounds like it might be convincing enough for some programs like "Hey, Siri" but it's not gonna convince your mom. You can listen to the samples on the page linked here and you can immediately tell that Obama and Trump don't sound quite human.
This is pretty basic at the moment and it's terrifying. Yeah, it has an MS Sam feel to it, but as the tech improves and we know it will, you could use a service like this to put words in someone's mouth. Think about how you could trip up a CEO or a Politician by playing some random clip that they never said. When that gets into the Zeitgeist judgments will be made in the court of public opinion devoid of facts or real evidence. You could destroy democracy or people's lives with technology like this
This is exciting! If you look at historic speeches (ie from American Rhetoric <a href="http://www.americanrhetoric.com/top100speechesall.html" rel="nofollow">http://www.americanrhetoric.com/top100speechesall.html</a>), there are large variations in average characteristics between various styles/contexts (on average, pitch/volume/speed are different for inspirational vs somber speeches, for example). But there are also really large differences in the variation - an inspirational speech may be marked by large swings from quiet, reflective pieces to booming, rousing calls-to-action while a somber speech has fewer swings in delivery.<p>For the examples given for various intonations from Obama/Trump, some intonations are much more natural than others. It would be interesting to decide how to parametrize a sentence for the intended intonation. (based on word2vec analysis of the words in the sentence, punctuation cues in the sentence, and perhaps a specified category of "emotional delivery").<p>It would be interesting at the sentence-level, but also at the macro speech-level to include the right "mix" of intonations for a specific context. On a related note, it would be interesting to study the patterns of intonations in successful vs unsuccessful outbound sales calls, for example, to learn how to best simulate a good human sales voice.
It's there any copyright protections for a person's voice? If not, David Attenborough and Morgan Freeman will be lead voice actors in my next game project
Is this enough to beat voice recognition software?<p>If you thought fake news was bad before wait until these 'secret' recordings start getting released and reported on.
Charles Schwab uses a voice phrase to authenticate you for access to your account, which is already pretty brittle, but I hope this makes them reconsider more urgently.
1. Is this company new?<p>2. Is this better then what Google or Baidu are doing?<p>3. I remember reading Adobe has something similar.<p>4. Why ( What happened ) that all of a sudden we have 4 company making voice breakthrough tech like these?<p>5. What Happen to Voice Acting? Places like Japan where they highly value voice actor. Is Voice even patentable?
I see a lot of people claiming that certain things will now be untrustworthy.<p>As if <i>human</i> voice imitators have not existed and could not be paid for prior to this. For $5 you can get Stewie Griffin [0] or Barack Obama [1] to say whatever you want them to say. Any audio-only messages of well known figures should already be considered "compromised" and untrustworthy. Even without the technology to impersonate them.<p>This should be more concerning for "normal people". It isn't that you can no longer trust an audio-only recording of Obama, but that you may not longer be certain an audio recording is from your best friend. (E: Once the technology improves a bit more of course.)<p>[0] <a href="https://www.fiverr.com/joe_stevens/talk-like-stewie-griffin-for-you" rel="nofollow">https://www.fiverr.com/joe_stevens/talk-like-stewie-griffin-...</a><p>[1] <a href="https://www.fiverr.com/celebimpression/do-a-custom-barack-obama-impersonation" rel="nofollow">https://www.fiverr.com/celebimpression/do-a-custom-barack-ob...</a>
This is awesome. As someone exploring the fictional storytelling space, this seems like it'd have a lot of fun applications in that space as well.<p>How difficult is it to create/tune voices from parameters rather than training from an audio clip? I build software where people create fictional characters for writing, and having an author "create" voices for each character would be an amazing way to autogenerate audiobooks with their voices, or interact with those characters by voice, or just hear things written from their point of view in their voice for that extra immersion. Having an author upload voice clips of themselves mimicking what they think that character should sound like, but probably would keep traces of their original voice (and feel "fake" to them because they can recognize their own voice), no?<p>Can't wait to see how this pans out. Signed up for the beta and will definitely be pushing it to its limits when it's ready. :)
It sounds like they're training a parametric speech synthesis platform on samples in order to learn the parameters. I wonder if there are are approaches at generating n-phones for concatenative models, or using a hybrid approach.<p>I built a toy concatenative Donald Trump speech system [1], but I don't have an ML background. I've been taking Andrew Ng's online course in addition to Udacity's deep learning program in an attempt to learn the basics. I'm hoping I can use my dataset to build something backed by ML that sounds better.<p>Is anyone in the Atlanta area interested in ML? I'd love to chat over coffee or join local ML interest groups.<p>[1] <a href="http://jungle.horse" rel="nofollow">http://jungle.horse</a>
This is very exciting to me because it lets RPGs provide spoken dialog for everything (I'm waiting to see if they can do emotions at all convincingly). Even big budget games suffer from "you can call your character anything as long as it's 'Shepherd'" simply because you can't mention the character's name or any other use-content safely.
I wonder how accurately this would reproduce dead musicians voices. I've had this idea for about 8 years called the Notorious BIG project. I have about 20 acapellas that I was originally going to manually chop into a song. Neural Nets can pretty much solve this now.
Can we get these speeches in audio form now?<p><a href="https://medium.com/@samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0" rel="nofollow">https://medium.com/@samim/obama-rnn-machine-generated-politi...</a>
As noted in other comments, all the samples still sound very robotic, so this is probably "just" a method to tune the parameters of an existing voice synthesizer to mimic a real persons voice as much as it allows.
The samples all sound a little like Rich Little and Stephen Hawking's love child doing impressions: they won't fool very many people.<p>But, you can certainly see where this is going and that's the worrisome part.
Oh yea. The Troll embedded deep in my soul giggles in glee.<p>However, the day some shill tries to sell me travel insurance in departed nana's voice would be the day I start signing my voice convos' with a pgp key.
This site has a "demo" section featuring only Soundcloud clips. Uses to much the present tense "In a world first, Montreal-based startup Lyrebird today unveiled" and "Record 1 minute [...] and Lyrebird can [..]Use this key to generate anything" but has no actual product or beta version. Adobe had a much more impressive sneak peek of a similar product called VoCo: <a href="https://www.youtube.com/watch?v=I3l4XLZ59iw" rel="nofollow">https://www.youtube.com/watch?v=I3l4XLZ59iw</a>
Excellent work. This will find widespread application in the film/tv/music industry and beyond (and we're not that far away from being able to do the same thing for video). Unfortunately it will also be widely abused, but given the near-inevitability of such technological development I'm already reconciled to that :-/
Curious choice to name a company & product with a name that sounds like "Liar Bird" when spoken. To me, that looks like they're fully embracing the concept that this can be used for nefarious purposes. If one of their goals is to bring attention that this technology exists and can be misused, the name reinforces that.
Sounds great, I was trying something like this in Keras but didn't get very far: <a href="https://github.com/sehugg/kerasspeechcodec" rel="nofollow">https://github.com/sehugg/kerasspeechcodec</a>
1. Buy the rights for "Car Talk" re-broadcast.
2. Record new, current ads using Click and Clack's voices.
3. If the voices sound a little too "mechanic", pretend it's a joke.
This trump version [1] is quite believable.
[1] <a href="https://soundcloud.com/user-535691776/trump-6" rel="nofollow">https://soundcloud.com/user-535691776/trump-6</a>
This technology reminded me of 24 (TV series).<p>The plot of season 2 has Jack Bauer prove a Cyprus recording between a terrorist and high-ranking Middle East officials was forged so the US president would start a war.
The President Obama voice sounds decent. But the President Trump and Senator Clinton voices sound like robots. Reminds me of the crappy text to speech program that came with Windows.
Coming soon - fake videos of future political candidates saying outrageous things that will derail their campaigns.<p>Maybe from now on - just learn ASL. Hard to fake a distinctive signing style.
it's interesting development but it sounds too robotic, there is zero intonation/punctuation, zero variantions in the voice depending on mood of speaker, etc., in the end extremely robotic and if someone really need to fake someone else voice convincingly it would be still easier to hire professional voice imitator
I wonder if you could do this with singing? Feed it acappela Bowie, Sinatra, Elvis songs, then give it new text, and out comes a similar voice and melody.
Now people can deny saying things caught on tape. Just show this technology to a jury considering taped evidence, and bring in some experts to testify on how it works.<p>The samples weren't that convincing to me, but could probably be used to switch a word here and there. That may be enough.
This is how a lot of tech companies make proper text2speech, this was just done using the vast amount of audio that's out there for these people.<p>Soon Trump will use this to state that things he's said are fake news. God help us all.
I have two domains:<p>- legalscreenshot.com<p>- legalprintscreen.com<p>I also developed a concept of "Reality Check" similar to Touring Test (when VR and AI becomes so convincing >50% people won't distinguish it from base reality)... Too bad I'm on the corporate network and my personal website is blocked: <a href="https://genesis.re/wiki" rel="nofollow">https://genesis.re/wiki</a><p>Aside: do you believe psychedelics should be the part of obligatory astronaut training?