科技回声

6 条评论

fxtentacle大约 3 年前

The title is very misleading. This is a thin 10-line Gradio GUI in front of the Huggingface Pipeline API, the latter of which will download 1000+ python files, a professionally pre-trained 1GB asr model, and a 500MB language model. But to all of that, Gradio isn't contributing. They are merely the GUI framework."Gradio GUI Python Package is compatible with Huggingface Inference Python Package"Yeah, duh.Also, I'm surprised that they chose Mozilla DeepSpeach which was last updated in 2020 instead of wav2vec2 which is actually competitive in recognition quality.EDIT: BTW if you're curious, you can try out many of the Huggingface pre-trained models here:<a href="https://huggingface.co/spaces/huggingface/hf-speech-bench" rel="nofollow">https://huggingface.co/spaces/huggingface/hf-speech-bench</a>and for example here's a Facebook pre-trained English model with good performance that you can easily embed into your own Python apps. [Use in Transformers] button at the top right of the page.<a href="https://huggingface.co/facebook/wav2vec2-base-960h" rel="nofollow">https://huggingface.co/facebook/wav2vec2-base-960h</a>

评论 #30853650 未加载

评论 #30854381 未加载

spullara大约 3 年前

Wow, this is really, really bad. Try this one to compare.<a href="https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/" rel="nofollow">https://azure.microsoft.com/en-us/services/cognitive-service...</a>I don't work for MSFT.

评论 #30852262 未加载

renierbotha大约 3 年前

Yeah not really working as expected.Was interested in this as I'm looking to build a "swearing detector" to help me swear less in video calls but this could not pick up one sentence properly out of a couple and then it started throwing errors.Think it needs some time back in the lab tbh.

jamal-kumar大约 3 年前

TUSTING TUSTING TESTTEST ONE TO TEST TEST ONE DO ONE TWO HREE FOUR OUD FIVE SIX SEVEN EIGHTI wouldn't exactly call this a success

thomasfromcdnjs大约 3 年前

Really nice work on the GUI, keep it up!

monkeydust大约 3 年前

ASR on my Pixel 6 has been a game changer, combo of accuracy and speed.

6 条评论

fxtentacle大约 3 年前

评论 #30853650 未加载

评论 #30854381 未加载

spullara大约 3 年前

评论 #30852262 未加载

renierbotha大约 3 年前

jamal-kumar大约 3 年前

TUSTING TUSTING TESTTEST ONE TO TEST TEST ONE DO ONE TWO HREE FOUR OUD FIVE SIX SEVEN EIGHTI wouldn't exactly call this a success

thomasfromcdnjs大约 3 年前

Really nice work on the GUI, keep it up!

monkeydust大约 3 年前

ASR on my Pixel 6 has been a game changer, combo of accuracy and speed.

Real Time Speech Recognition with Gradio

6 条评论

Real Time Speech Recognition with Gradio

6 条评论