科技回声

12 条评论

siraben超过 1 年前

One feature of Whisper I think people underuse is the ability to prompt the model to influence the output tokens. This can be used to correct spelling/context-dependent words. Some examples from my terminal history:<pre><code> ./main -m models/ggml-small.bin -f alice.wav --prompt "Audiobook reading by a British woman:" ./main -m models/ggml-small.bin -f output.wav --prompt "Research talk by Junyao, Harbin Institute of Technology, Shenzhen, research engineer at MaiMemo" </code></pre> Also works multi-lingual. You can use this to influence transcription to produce traditional/simplified Chinese characters for instance.Although I seem to have trouble to get the context to persist across hundreds of tokens. Tokens that are corrected may revert back to the model's underlying tokens if they weren't repeated enough.

评论 #38267365 未加载

评论 #38267586 未加载

评论 #38267458 未加载

评论 #38267298 未加载

coder543超过 1 年前

The submission link is weird. It has far fewer stars than the repo it is forked from, and could just be an ad for replicate.com?It is missing the most recent commits from what appears to be the real source: <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a>The only added commit is adding a replicate.com example, whatever that means.

评论 #38267789 未加载

评论 #38270292 未加载

kamranjon超过 1 年前

I'm sort of confused - is this just a CLI wrapper around faster-whisper, transformers and distil-whisper? Will this be any faster than running those by themselves? There doesn't seem to be much code here, so this is why I'm wondering if this is actually something to get excited about if I already am aware of those projects.Edit: Also this seems a bit suspicious - this seems like someone just forked another persons active repo (<a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a>) and posted as their own?

评论 #38267550 未加载

评论 #38270300 未加载

pen2l超过 1 年前

Transcription speed and accuracy keeps going up and it’s delightful to see the progress, I wish though more effort was dedicated to creating integrated solutions that could accurately transcribe with speaker diarization.

评论 #38267219 未加载

评论 #38267287 未加载

评论 #38267441 未加载

danso超过 1 年前

So what's in the secret sauce? e.g. faster-whisper "is a reimplementation of OpenAI's Whisper model using CTranslate2" and claims 4x the speed of whisper; what does insanely-fast-whisper do to achieve its gains?

sp332超过 1 年前

Why do none of the benchmarks in the table match the headline?

评论 #38267188 未加载

评论 #38267229 未加载

lartin_muther超过 1 年前

coming soon: Careless Whisper (transcribes audio but every few minutes it goes "idk here they said something or other")

评论 #38269170 未加载

refulgentis超过 1 年前

Flagged, fork of project that launched last week that did this and had its own HN story.

emadda超过 1 年前

I recently released <a href="https://bigwav.app" rel="nofollow noreferrer">https://bigwav.app</a>It’s whisper in the browser using WASM with an transcription annotation Ui.

msoad超过 1 年前

Can this do realtime transcribing (streaming)?

评论 #38267579 未加载

评论 #38267438 未加载

asadm超过 1 年前

is there an API that lets me run whisper in a streaming manner?

deegles超过 1 年前

can this use speaker diarization?

评论 #38268078 未加载

12 条评论

siraben超过 1 年前

评论 #38267365 未加载

评论 #38267586 未加载

评论 #38267458 未加载

评论 #38267298 未加载

coder543超过 1 年前

评论 #38267789 未加载

评论 #38270292 未加载

kamranjon超过 1 年前

评论 #38267550 未加载

评论 #38270300 未加载

pen2l超过 1 年前

评论 #38267219 未加载

评论 #38267287 未加载

评论 #38267441 未加载

danso超过 1 年前

sp332超过 1 年前

Why do none of the benchmarks in the table match the headline?

评论 #38267188 未加载

评论 #38267229 未加载

lartin_muther超过 1 年前

coming soon: Careless Whisper (transcribes audio but every few minutes it goes "idk here they said something or other")

评论 #38269170 未加载

refulgentis超过 1 年前

Flagged, fork of project that launched last week that did this and had its own HN story.

emadda超过 1 年前

I recently released <a href="https://bigwav.app" rel="nofollow noreferrer">https://bigwav.app</a>It’s whisper in the browser using WASM with an transcription annotation Ui.

msoad超过 1 年前

Can this do realtime transcribing (streaming)?

评论 #38267579 未加载

评论 #38267438 未加载

asadm超过 1 年前

is there an API that lets me run whisper in a streaming manner?

deegles超过 1 年前

can this use speaker diarization?

评论 #38268078 未加载

Insanely Fast Whisper

12 条评论

Insanely Fast Whisper

12 条评论