TechEcho

16 comments

FL33TW00Dover 1 year ago

Super exciting! I'll be shipping Distil-Whisper to whisper-turbo tomorrow! <a href="https://github.com/FL33TW00D/whisper-turbo">https://github.com/FL33TW00D/whisper-turbo</a>Should make running in the browser feasible even for underpowered devices: <a href="https://whisper-turbo.com/" rel="nofollow noreferrer">https://whisper-turbo.com/</a>

评论 #38096883 未加载

asplakeover 1 year ago

It’s a shame that the README doesn’t link to the original Whisper, or at least not prominently. There’s the etiquette, but also I still don’t really know what this does.

评论 #38096651 未加载

评论 #38096737 未加载

jankovicsandrasover 1 year ago

I'm using this: <a href="https://github.com/guillaumekln/faster-whisper">https://github.com/guillaumekln/faster-whisper</a> Smaller, faster, works well with CPU, multiple languages, etc.

评论 #38096485 未加载

评论 #38096262 未加载

cjdellover 1 year ago

I wonder if fast enough for wakeword detection in WASM. Picovoice worked extremely well for this but it's proprietary.

评论 #38097524 未加载

评论 #38100863 未加载

评论 #38098572 未加载

评论 #38097615 未加载

GaggiXover 1 year ago

It seems they have only distilled on English data, so the distil-large-v2 model will probably perform badly with any other language, we'll see tomorrow when they are going to release their models.

评论 #38100765 未加载

mklover 1 year ago

> performs within 1% WERFrom the paper, for short-form audio:> the distil-large-v2 model achieves the lowest overall average WER of 10.1%. It is one percentage point higher than the large-v2 baseline, with 5.8 times faster inference speed and fewer than half the parameters.Long-form is similar, except Distil-Whisper does slightly better than Whisper (fewer hallucinations, apparently).10% WER seems awfully high, and doesn't match my experience with Whisper. Maybe my audio is nice and clean relative to their test set?

评论 #38099027 未加载

评论 #38096746 未加载

评论 #38096094 未加载

regularfryover 1 year ago

Funnily enough, `-small`, `-base` and `-tiny` versions of this would be more exciting to me. `small.en` is the largest of the original whisper models that will run anywhere near usable speed on a raspberry pi zero 2 with whisper.cpp, and it's still too slow to really bother with for streaming. Anything smaller is too inaccurate for day to day use. If there was a distilled version which had a similar 6x speedup, that would be transformative.

评论 #38096706 未加载

评论 #38097252 未加载

yjftsjthsd-hover 1 year ago

On a partially-related note, has anyone packaged any version of whisper as an Android keyboard? It seems like a reasonably good fit, and I would be interested to see if it worked better than the deteriorating quality of Google's offering. I think it would work even with the existing versions, but a faster+smaller version would obviously be a better fit for running on phone hardware.

评论 #38114447 未加载

zaptremover 1 year ago

How much faster in real wall-clock time is this in batched data than <a href="https://github.com/m-bain/whisperX">https://github.com/m-bain/whisperX</a> ?

评论 #38096059 未加载

apiover 1 year ago

Is there a good project out there that pairs whisper with something like llama.cpp to create a private local voice assistant?Llama2 isn't as good as GPT-4 but it's a hell of a lot smarter at Q&A than Siri or Alexa or any of those things.PSA: I will pay for such a thing if it's really good, privacy respecting, local-first, and preferably at least source available.

评论 #38097441 未加载

评论 #38101424 未加载

评论 #38097387 未加载

评论 #38097653 未加载

iAkashPaulover 1 year ago

I've tried the large-v2 on translate task but the results aren't great. Guess there needs to be another round of distillation with translate task thrown in too.

VadimPRover 1 year ago

Does anyone know if it is possible to fine-tune the whisper models to add new words? Say, brand names it doesn't yet know about?

评论 #38097069 未加载

pkoirdover 1 year ago

Have not read the paper yet but why do they only cut the decoder and not the encoder?

评论 #38098543 未加载

评论 #38098505 未加载

asadmover 1 year ago

English only it seems :(

siva7over 1 year ago

Hm isn't this problematic from a trademark pov?

评论 #38096641 未加载

spandextwinsover 1 year ago

Nice! But next time do the press release when the product is released. Really tired of sites like HN pushing these stories out without any code or files Feels like vaporware.

16 comments

FL33TW00Dover 1 year ago

评论 #38096883 未加载

asplakeover 1 year ago

It’s a shame that the README doesn’t link to the original Whisper, or at least not prominently. There’s the etiquette, but also I still don’t really know what this does.

评论 #38096651 未加载

评论 #38096737 未加载

jankovicsandrasover 1 year ago

I'm using this: <a href="https://github.com/guillaumekln/faster-whisper">https://github.com/guillaumekln/faster-whisper</a> Smaller, faster, works well with CPU, multiple languages, etc.

评论 #38096485 未加载

评论 #38096262 未加载

cjdellover 1 year ago

I wonder if fast enough for wakeword detection in WASM. Picovoice worked extremely well for this but it's proprietary.

评论 #38097524 未加载

评论 #38100863 未加载

评论 #38098572 未加载

评论 #38097615 未加载

GaggiXover 1 year ago

It seems they have only distilled on English data, so the distil-large-v2 model will probably perform badly with any other language, we'll see tomorrow when they are going to release their models.

评论 #38100765 未加载

mklover 1 year ago

评论 #38099027 未加载

评论 #38096746 未加载

评论 #38096094 未加载

regularfryover 1 year ago

评论 #38096706 未加载

评论 #38097252 未加载

yjftsjthsd-hover 1 year ago

评论 #38114447 未加载

zaptremover 1 year ago

How much faster in real wall-clock time is this in batched data than <a href="https://github.com/m-bain/whisperX">https://github.com/m-bain/whisperX</a> ?

评论 #38096059 未加载

apiover 1 year ago

评论 #38097441 未加载

评论 #38101424 未加载

评论 #38097387 未加载

评论 #38097653 未加载

iAkashPaulover 1 year ago

I've tried the large-v2 on translate task but the results aren't great. Guess there needs to be another round of distillation with translate task thrown in too.

VadimPRover 1 year ago

Does anyone know if it is possible to fine-tune the whisper models to add new words? Say, brand names it doesn't yet know about?

评论 #38097069 未加载

pkoirdover 1 year ago

Have not read the paper yet but why do they only cut the decoder and not the encoder?

评论 #38098543 未加载

评论 #38098505 未加载

asadmover 1 year ago

English only it seems :(

siva7over 1 year ago

Hm isn't this problematic from a trademark pov?

评论 #38096641 未加载

spandextwinsover 1 year ago

Nice! But next time do the press release when the product is released. Really tired of sites like HN pushing these stories out without any code or files Feels like vaporware.

Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller

16 comments

Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller

16 comments