This is better than any other speech-to-text setup I've ever encountered, for one simple reason: I followed the dead-simple install steps in the readme, started the program, <i>and it worked.</i> Bonus points for the install being a git clone and pip install away. I don't know why this is a hard bar to clear, but bravo. (I <i>suspect</i> that it's because a lot of FOSS speech recognition is from academia where "follow the following 13 steps, including hand-crafting recognition parameters" is more normal and acceptable because everyone involved is already a domain expert, whereas I, as a user, just want "plug in a mic, run this thing, and get text on stdout".)
You know... I have an idea. How about we use vosk and this tech to integrate with ffmpeg somehow so that peertube videos can get subtitles while being transcoded. Once we get English SRT, we could use libretranslate to translate that English SRT to multiple languages.<p>This could be similar to what YouTube does with it's automatic subtitles.
What do you guys say?
I've never even heard of VOSK-API [0], the underlying offline speech to text engine that this project uses.<p>Does anyone have experience using it? Is it any good?<p>[0] <a href="https://github.com/alphacep/vosk-api" rel="nofollow">https://github.com/alphacep/vosk-api</a>
Nice. Another notable mention in this space is Talon. Useful for automating all OS tasks with voice commands, as well as just dictation: <a href="https://talonvoice.com/" rel="nofollow">https://talonvoice.com/</a>
This is such an amazing technology for the many tech people who are having to deal with hand/finger/elbow issues after extensive usage for years on their keyboards.<p>I was looking for this type of tech for at least 2 years and I am glad it now exists.<p>FOSS is amazing!
I was wondering how well it dealt with accents, them I saw that the Vosk API page specifically mentions "English, Indian English, German, French, ..." :D I don't know the story behind "Indian English" specifically being listed as a separate language, but I'm glad to see it's supported.
Vosk, is it "wax" in Russian ("воск")?<p>I think of wax recording rolls - old days CDs, aka Phonograph cylinder:<p><a href="https://en.m.wikipedia.org/wiki/Phonograph_cylinder" rel="nofollow">https://en.m.wikipedia.org/wiki/Phonograph_cylinder</a>
I'm throwing another hat in the ring as this technology totally working most of the time. I used it to write this comment.<p>This should make my life a lot easier because I find myself going to my phone and using the dictation feature a lot recently. It's not as good as the one on my android, but it's 95% of the way there.
is there an offline good program for text to speech for german,french,spanish,english? and no, festival and espeak are not what i would consider good.<p>the at&t website with text to speech as audio file which were used in these anonymous publications are good, but not espeak. if i had sth like this for european (and russian and arab languages) as open source standalone, i would be happy :(