The title is very misleading. This is a thin 10-line Gradio GUI in front of the Huggingface Pipeline API, the latter of which will download 1000+ python files, a professionally pre-trained 1GB asr model, and a 500MB language model. But to all of that, Gradio isn't contributing. They are merely the GUI framework.<p>"Gradio GUI Python Package is compatible with Huggingface Inference Python Package"<p>Yeah, duh.<p>Also, I'm surprised that they chose Mozilla DeepSpeach which was last updated in 2020 instead of wav2vec2 which is actually competitive in recognition quality.<p>EDIT: BTW if you're curious, you can try out many of the Huggingface pre-trained models here:<p><a href="https://huggingface.co/spaces/huggingface/hf-speech-bench" rel="nofollow">https://huggingface.co/spaces/huggingface/hf-speech-bench</a><p>and for example here's a Facebook pre-trained English model with good performance that you can easily embed into your own Python apps. [Use in Transformers] button at the top right of the page.<p><a href="https://huggingface.co/facebook/wav2vec2-base-960h" rel="nofollow">https://huggingface.co/facebook/wav2vec2-base-960h</a>