Super exciting!
I'll be shipping Distil-Whisper to whisper-turbo tomorrow! <a href="https://github.com/FL33TW00D/whisper-turbo">https://github.com/FL33TW00D/whisper-turbo</a><p>Should make running in the browser feasible even for underpowered devices:
<a href="https://whisper-turbo.com/" rel="nofollow noreferrer">https://whisper-turbo.com/</a>
It’s a shame that the README doesn’t link to the original Whisper, or at least not prominently. There’s the etiquette, but also I still don’t really know what this does.
I'm using this: <a href="https://github.com/guillaumekln/faster-whisper">https://github.com/guillaumekln/faster-whisper</a>
Smaller, faster, works well with CPU, multiple languages, etc.
It seems they have only distilled on English data, so the distil-large-v2 model will probably perform badly with any other language, we'll see tomorrow when they are going to release their models.
> performs within 1% WER<p>From the paper, for short-form audio:<p>> the distil-large-v2 model achieves the lowest overall average WER of 10.1%. It is one percentage point higher than the large-v2 baseline, with 5.8 times faster inference speed and fewer than half the parameters.<p>Long-form is similar, except Distil-Whisper does slightly better than Whisper (fewer hallucinations, apparently).<p>10% WER seems awfully high, and doesn't match my experience with Whisper. Maybe my audio is nice and clean relative to their test set?
Funnily enough, `-small`, `-base` and `-tiny` versions of this would be more exciting to me. `small.en` is the largest of the original whisper models that will run anywhere near usable speed on a raspberry pi zero 2 with whisper.cpp, and it's still too slow to really bother with for streaming. Anything smaller is too inaccurate for day to day use. If there was a distilled version which had a similar 6x speedup, that would be transformative.
On a partially-related note, has anyone packaged any version of whisper as an Android keyboard? It seems like a reasonably good fit, and I would be interested to see if it worked better than the deteriorating quality of Google's offering. I think it would work even with the existing versions, but a faster+smaller version would obviously be a better fit for running on phone hardware.
How much faster in real wall-clock time is this in batched data than <a href="https://github.com/m-bain/whisperX">https://github.com/m-bain/whisperX</a> ?
Is there a good project out there that pairs whisper with something like llama.cpp to create a private local voice assistant?<p>Llama2 isn't as good as GPT-4 but it's a hell of a lot smarter at Q&A than Siri or Alexa or any of those things.<p>PSA: I will pay for such a thing if it's really good, privacy respecting, local-first, and preferably at least source available.
I've tried the large-v2 on translate task but the results aren't great. Guess there needs to be another round of distillation with translate task thrown in too.
Nice! But next time do the press release when the product is released.
Really tired of sites like HN pushing these stories out without any code or files
Feels like vaporware.