Can Whisper-JAX also translate audio streams in real time from X -> Y, both of which are non-English languages?<p>Additionally I have tried Whisper JAX on JupyterHub and for some reason it does not transcribe/translate under 10 seconds for me. In fact, if an audio is 5 minutes long, it would still take 3-4 minutes for transcribing it, although I have followed the similar steps as the ones in the Kaggle notebook, posted by the Author themself. Any ideas/suggestions why this would be happenind would be really helpful.<p>Thank you!
I've been looking for faster implementations of Whisper, the main drawback with Whisper Jax is that the performance comes from running on Google TPUs, which are much more expensive than GPUs.<p>On "normal" GPUs the fastest implementation I've found is <a href="https://github.com/guillaumekln/faster-whisper">https://github.com/guillaumekln/faster-whisper</a>. Whisper.cpp works faster on a CPU, especially on Apple Silicon, but still nowhere near the performance you could get on a GPU (understandably).<p>How does Whisper Jax compares to faster-whisper on a GPU?
Whisper JAX is an optimised implementation of the Whisper model by OpenAI. It runs on JAX with a TPU v4-8 in the backend. Compared to PyTorch on an A100 GPU, it is over 70x faster, making it the fastest Whisper API available.