One feature of Whisper I think people underuse is the ability to prompt the model to influence the output tokens. This can be used to correct spelling/context-dependent words. Some examples from my terminal history:<p><pre><code> ./main -m models/ggml-small.bin -f alice.wav --prompt "Audiobook reading by a British woman:"
./main -m models/ggml-small.bin -f output.wav --prompt "Research talk by Junyao, Harbin Institute of Technology, Shenzhen, research engineer at MaiMemo"
</code></pre>
Also works multi-lingual. You can use this to influence transcription to produce traditional/simplified Chinese characters for instance.<p>Although I seem to have trouble to get the context to persist across hundreds of tokens. Tokens that are corrected may revert back to the model's underlying tokens if they weren't repeated enough.