I'm sure there are definitely copyright concerns, and I'm definitely sure that there are definitely some barriers to doing this...but I've been playing around with the chatgpt app talk to text, and from my understanding it uses the whisper-large-v2 model.<p>It is absolutely outstanding.<p>I have a lot of friends in who work in law and medicine, and from what I have heard, Dragon Speech Recognition is the king of the hill. From a quick wikipedia search, it seems like it is based on Hidden Markov Models". Is this something that is noticeable different than what The whisper AI is doing? And is there anything stopping someone from training a large dataset on audio and releasing a text to speech app that immediately dethrones dragon/siri/google dictate/Alexa/windows text to speech?
There have actually been some papers on "better than SOTA" TTS speech models with shockingly good inflection, emotion, voice imitation and such.<p>But the orgs behind them say they are hesitant to release them due to obvious misuse concerns. And I think the <i>unspoken</i> concern is that the datasets are not clean.