TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Online speech recognition with wav2letter anywhere

234 点作者 moneil971超过 5 年前

17 条评论

tingletech超过 5 年前
All I see is &quot;&quot;&quot;Sorry, this content isn&#x27;t available right now The link you followed may have expired, or the page may only be visible to an audience you&#x27;re not in. Go back to the previous page · Go to News Feed · Visit our Help Center&quot;&quot;&quot;<p>Edit: found a link that works <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;wav2letter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;wav2letter</a>
评论 #22042079 未加载
评论 #22041893 未加载
dvduval超过 5 年前
So by open sourced I assume this means there are absolutely no Facebook dependencies where the voice is passing through a Facebook server? Sorry, have to ask, as my trust level is low. Otherwise, awesome!
评论 #22037314 未加载
评论 #22037335 未加载
评论 #22041913 未加载
gliese1337超过 5 年前
Online speech recognition <i>for English</i>.<p>The framework should be generalizable, but the models they are making available are only for English. Actually adapting this for any other language would be a huge amount of additional work.
评论 #22040107 未加载
评论 #22038124 未加载
Jnr超过 5 年前
How does this compare to Mozilla&#x27;s DeepSpeech?<p>And does anyone know when Mozilla will release the updated Common Voice dataset from <a href="https:&#x2F;&#x2F;voice.mozilla.org" rel="nofollow">https:&#x2F;&#x2F;voice.mozilla.org</a> ?
评论 #22038188 未加载
评论 #22039722 未加载
jwineinger超过 5 年前
I&#x27;d love a tutorial that shows a normal guy like me how to use this tool with the pre-trianed models to transcribe my audio files. Not finding anything of that kind included there.
评论 #22038734 未加载
gok超过 5 年前
The preprint: <a href="https:&#x2F;&#x2F;research.fb.com&#x2F;wp-content&#x2F;uploads&#x2F;2020&#x2F;01&#x2F;Scaling-up-online-speech-recognition-using-ConvNets.pdf" rel="nofollow">https:&#x2F;&#x2F;research.fb.com&#x2F;wp-content&#x2F;uploads&#x2F;2020&#x2F;01&#x2F;Scaling-u...</a><p>Interestingly, the baselines are all systems that model grapheme forms instead of acoustic (phonemes) directly.
faitswulff超过 5 年前
Speaking as a Facebook user, I&#x27;m a bit confused - where do they use speech recognition? Or is this just purely research oriented?
评论 #22038457 未加载
评论 #22038146 未加载
评论 #22044765 未加载
评论 #22042990 未加载
评论 #22042082 未加载
评论 #22039839 未加载
isoos超过 5 年前
I&#x27;d be really interested in the accuracy of this tool to solve Google audio captchas. I&#x27;m assuming the price of solving captchas will go further down.
评论 #22040143 未加载
jonathanleroux超过 5 年前
If I may insert a relevant plug: we (MERL) just put out a paper last week with SOTA 7.0 % WER on LibriSpeech test-other (vs wav2letter@anywhere&#x27;s 7.5%) with 590 ms theoretical latency using joint CTC-Transformer with parallel time-delayed LSTM and triggered attention. Check it out: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2001.02674" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2001.02674</a>
评论 #22054651 未加载
cproctor超过 5 年前
I&#x27;m about to start as a professor in CS education, and am hoping we&#x27;re getting close to the point where I can easily transcribe interviews and high-quality dialogue audio using open-sourced models running on machines in my lab. I&#x27;m tired of paying $1&#x2F;minute for human transcription that&#x27;s not great anyway, and would love to undertake research that would require processing a lot more audio than is affordable on those terms.<p>I haven&#x27;t kept up with developments over the last two years--anyone have a sense of whether this is close to being a reality?<p>(I&#x27;ve taken a bunch of Stanford&#x27;s graduate AI courses on NLP and speech recognition; I can read documentation and deploy&#x2F;configure models but don&#x27;t have much appetite for getting into the weeds.)
评论 #22040285 未加载
评论 #22040411 未加载
ColanR超过 5 年前
So what&#x27;s the efficiency of this model? Can I use it instead of pocketsphinx on a raspberry pi?
评论 #22037766 未加载
rexreed超过 5 年前
Given that this uses a beam search decoder to find the most likely word pattern, is it possible small perturbations in audio could cause it to improperly decode certain word strings? Sort of like the audio equivalent of adversarial attacks, but on ASR?
yellow_lead超过 5 年前
The name must be a nod to Word2Vec[1]. A cool naming scheme IMO.<p>[1] <a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Word2vec" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Word2vec</a>
评论 #22041587 未加载
评论 #22040700 未加载
starpilot超过 5 年前
Do the pretrained models work decently on landline phone quality recordings? I can see massive value for this if it can transcribe corporate call center audio.
评论 #22041224 未加载
z3t4超过 5 年前
For any project like this, please post exactly the sound configuration used for the model. eg. the rate (Hz), channels, and format.
amluto超过 5 年前
I wonder if this would be a good engine to plug in to rhasspy.
phkahler超过 5 年前
Which OSS license?
评论 #22039879 未加载