One thing I've found challenging about the Whisper APIs is that it performs quite poorly when trying to do "realtime transcription" - I played around with some of the whisper.cpp stuff to get it running, and with the tiny model, I was almost able to get reliable transcriptions, but it seems like other than static mp3 files, it is a Hard Problem [tm] that will need further work to get really good.<p>My use case was to try to make an AI assistant that would transcribe my audio requests and then turn that into a payload for one of the GPT-X APIs