>After two hitmen were arrested for killing a man, the FBI wanted to prove that they'd been hired by a family going through a child custody dispute. The FBI arranged to trick the family into believing that they were being blackmailed for their involvement - and then sat back to see the reaction. While texts and phone calls were reasonably easy for the FBI to access, in-person meetings in two restaurants were a different matter. But the court authorised the use of Wave Sciences’ algorithm, meaning that the audio went from being inadmissible to a pivotal piece of evidence.<p>While the technology and the way it is marketed now is cool and everything, it feels like the most interesting part of the article is this FBI investigation. Too bad it's only mentioned in these few sentences with no further details.
Software that does this is running on millions of Android phones. After you do OK Google voice enrollment, they calculate a speaker representation embedding for you. When the voice activity detector detects voice, it passes the audio through VoiceFilter-Lite <a href="https://google.github.io/speaker-id/publications/VoiceFilter-Lite/index.html" rel="nofollow">https://google.github.io/speaker-id/publications/VoiceFilter...</a> along with your speaker embedding, and the model is trained to return only your speech. The original, more computationally expensive model is called VoiceFilter and I was so amazed when I saw the examples on this page: <a href="https://google.github.io/speaker-id/publications/VoiceFilter/" rel="nofollow">https://google.github.io/speaker-id/publications/VoiceFilter...</a>
You can see some information on how the technology works in this demo / paper: <a href="https://news.ycombinator.com/item?id=41455176">https://news.ycombinator.com/item?id=41455176</a>