TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Real Time Speech Recognition with Gradio

68 点作者 ak391大约 3 年前

6 条评论

fxtentacle大约 3 年前
The title is very misleading. This is a thin 10-line Gradio GUI in front of the Huggingface Pipeline API, the latter of which will download 1000+ python files, a professionally pre-trained 1GB asr model, and a 500MB language model. But to all of that, Gradio isn&#x27;t contributing. They are merely the GUI framework.<p>&quot;Gradio GUI Python Package is compatible with Huggingface Inference Python Package&quot;<p>Yeah, duh.<p>Also, I&#x27;m surprised that they chose Mozilla DeepSpeach which was last updated in 2020 instead of wav2vec2 which is actually competitive in recognition quality.<p>EDIT: BTW if you&#x27;re curious, you can try out many of the Huggingface pre-trained models here:<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;huggingface&#x2F;hf-speech-bench" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;huggingface&#x2F;hf-speech-bench</a><p>and for example here&#x27;s a Facebook pre-trained English model with good performance that you can easily embed into your own Python apps. [Use in Transformers] button at the top right of the page.<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;facebook&#x2F;wav2vec2-base-960h" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;facebook&#x2F;wav2vec2-base-960h</a>
评论 #30853650 未加载
评论 #30854381 未加载
spullara大约 3 年前
Wow, this is really, really bad. Try this one to compare.<p><a href="https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;services&#x2F;cognitive-services&#x2F;speech-to-text&#x2F;" rel="nofollow">https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;services&#x2F;cognitive-service...</a><p>I don&#x27;t work for MSFT.
评论 #30852262 未加载
renierbotha大约 3 年前
Yeah not really working as expected.<p>Was interested in this as I&#x27;m looking to build a &quot;swearing detector&quot; to help me swear less in video calls but this could not pick up one sentence properly out of a couple and then it started throwing errors.<p>Think it needs some time back in the lab tbh.
jamal-kumar大约 3 年前
TUSTING TUSTING TESTTEST ONE TO TEST TEST ONE DO ONE TWO HREE FOUR OUD FIVE SIX SEVEN EIGHT<p>I wouldn&#x27;t exactly call this a success
thomasfromcdnjs大约 3 年前
Really nice work on the GUI, keep it up!
monkeydust大约 3 年前
ASR on my Pixel 6 has been a game changer, combo of accuracy and speed.