My all-time favorite audio deepfake is Nobel prize winner Milton Friedman reading the lyrics to the 50 Cent track "PIMP". It really captures Friedman's tell-tale cadence and idiosyncratic lilt: <a href="https://www.youtube.com/watch?v=4mUYMvuNIas" rel="nofollow">https://www.youtube.com/watch?v=4mUYMvuNIas</a>
There is already a large problem with political ads cherry picking and slicing audio and video to cheat viewers. I really worry that deep fakes will take it to another level completely. I fully expect the current administration to eagerly adopt it if available.
This might be paranoid. But I've established a protocol with some people in my life. Should someone with my voice ever contact them and ask for money (because emergency bla bla), nothing is to be done until a passphrase is mentioned. It's only a matter of time, until someone gets significant voice data and related contact numbers and proceeds with using those voices to train a model. Afterwards, that model will be used to real-time fake the original voice in a scamming attempt.
Recently a friend changed her number and told me via text. Before adding her number i asked her a question that she and only I would know like who sat next to you at the old office.<p>Think im going to keep doing this type of verification. It may annoy friends and family, but not sure how a hacker could ever know such small details between you and another.
There is an annual challenge for synthetic voice detection, ASVSpoof, that evaluates submissions on different types of attacks to speaker verification systems: text to speech, voice conversions and replayed attacks.<p>The conclusion from the 2019 evaluation [1]: <i>known</i> synthetic deep fakes are fairly easy to detect using simple models with very low error rates (even high-fidelity techniques with Wavenet vocoders).<p>[1]: ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
speech (<a href="https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2249.pdf" rel="nofollow">https://www.isca-speech.org/archive/Interspeech_2019/pdfs/22...</a>)
> Deepfake technology is not sophisticated enough to mimic an entire phone call with someone.<p>With modern voice conversion technology it is perfectly possible actually.
Audio "deepfakes" have been worked on much longer than ones for video, although video deepfakes have the added issue of deep-faking synchronized audio. Today's consumers don't seem to be bothered by video deepfakes if they play to the beliefs of the audience.
Useful example is how the Joe Biden Burisma phone call that bubbled up through Russian media was fabricated. I pulled it apart with ffmpeg and there were a number of artifacts that showed editing and splicing.<p>If you're handy with ffmpeg and python, you can assess their veracity pretty easily. Of course, if I were on a political ratf'ing team, I'd use the same tools to add those artifacts to a copy of an offending (real but off message) stream and amplify the distribution of that fake-faked version with a debunking press release handy, so YMMV. While the Biden thing wasn't a deepfake directly, (shallow fake?) we're going to see tons of actual deepfakes around the election.<p>IMO, elections are no longer between candidates, they are a war on truth for domination of the narrative - office is the effect. A campaign that focuses on what happens once the war is over is daydreaming about the future and distracted from the present and this will lose them key battles. For this reason, I think deepfakes are going to be the biggest weapon in campaign arsenals for the near future. Interesting times.
The images of spectrogram analysis between the real and fake voices seemed to be distinguishable by the human eye. Can a image model be trained to detect fake voice spectrograms based on pitch and tone choppiness?
This makes me wonder how would one go about adding an authentication key to audio? We have seen in the past encryption for text shared via email and watermarks embedded in images but I haven't come across something for audio. Happy to hear if someone has worked in this field.