For those interested, Signal doesn't seem to use ZRTP anymore:<p>> <i>The new Signal voice and video beta functionality eliminates the need for ZRTP. The "signaling" messages used to set up the voice/video beta calls (offer/answer SDPs, ICE candidates, etc) are transmitted over the normal Signal Protocol messaging channel, which binds the security of the call to that existing secure channel. It is no longer necessary to verify an additional SAS, which simplifies the calling experience.</i><p><a href="https://whispersystems.org/blog/signal-video-calls-beta/" rel="nofollow">https://whispersystems.org/blog/signal-video-calls-beta/</a><p>And it's not in beta anymore:<p><a href="https://whispersystems.org/blog/signal-video-calls/" rel="nofollow">https://whispersystems.org/blog/signal-video-calls/</a>
The more interesting would be to see how feasible is to crack the in band SAS authentication string, when callers verbally verify it.<p>Deep learning and ability to train on a specific callers' voice [1] then mimic it might be an interesting attack vector. In practice Silent Circle's implementation does something interesting and instead of SAS numbers use dictionary words. So you end up with something like "Pink Elephant Salad". Could probably MitM that. However callers are then supposed to make some extra puns or discuss it a bit and say something like "Ha-ha! Wonder how tasty the an elephant salad would be". And if after MitM-ing, the string to the other side was "Plastic Blue Llamas" then a MitM attack becomes more obvious.<p>[1] <a href="http://research.baidu.com/deep-voice-production-quality-text-speech-system-constructed-entirely-deep-neural-networks/" rel="nofollow">http://research.baidu.com/deep-voice-production-quality-text...</a>