TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Online speech recognition with wav2letter anywhere

234 pointsby moneil971over 5 years ago

17 comments

tingletechover 5 years ago
All I see is &quot;&quot;&quot;Sorry, this content isn&#x27;t available right now The link you followed may have expired, or the page may only be visible to an audience you&#x27;re not in. Go back to the previous page · Go to News Feed · Visit our Help Center&quot;&quot;&quot;<p>Edit: found a link that works <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;wav2letter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;wav2letter</a>
评论 #22042079 未加载
评论 #22041893 未加载
dvduvalover 5 years ago
So by open sourced I assume this means there are absolutely no Facebook dependencies where the voice is passing through a Facebook server? Sorry, have to ask, as my trust level is low. Otherwise, awesome!
评论 #22037314 未加载
评论 #22037335 未加载
评论 #22041913 未加载
gliese1337over 5 years ago
Online speech recognition <i>for English</i>.<p>The framework should be generalizable, but the models they are making available are only for English. Actually adapting this for any other language would be a huge amount of additional work.
评论 #22040107 未加载
评论 #22038124 未加载
Jnrover 5 years ago
How does this compare to Mozilla&#x27;s DeepSpeech?<p>And does anyone know when Mozilla will release the updated Common Voice dataset from <a href="https:&#x2F;&#x2F;voice.mozilla.org" rel="nofollow">https:&#x2F;&#x2F;voice.mozilla.org</a> ?
评论 #22038188 未加载
评论 #22039722 未加载
jwineingerover 5 years ago
I&#x27;d love a tutorial that shows a normal guy like me how to use this tool with the pre-trianed models to transcribe my audio files. Not finding anything of that kind included there.
评论 #22038734 未加载
gokover 5 years ago
The preprint: <a href="https:&#x2F;&#x2F;research.fb.com&#x2F;wp-content&#x2F;uploads&#x2F;2020&#x2F;01&#x2F;Scaling-up-online-speech-recognition-using-ConvNets.pdf" rel="nofollow">https:&#x2F;&#x2F;research.fb.com&#x2F;wp-content&#x2F;uploads&#x2F;2020&#x2F;01&#x2F;Scaling-u...</a><p>Interestingly, the baselines are all systems that model grapheme forms instead of acoustic (phonemes) directly.
faitswulffover 5 years ago
Speaking as a Facebook user, I&#x27;m a bit confused - where do they use speech recognition? Or is this just purely research oriented?
评论 #22038457 未加载
评论 #22038146 未加载
评论 #22044765 未加载
评论 #22042990 未加载
评论 #22042082 未加载
评论 #22039839 未加载
isoosover 5 years ago
I&#x27;d be really interested in the accuracy of this tool to solve Google audio captchas. I&#x27;m assuming the price of solving captchas will go further down.
评论 #22040143 未加载
jonathanlerouxover 5 years ago
If I may insert a relevant plug: we (MERL) just put out a paper last week with SOTA 7.0 % WER on LibriSpeech test-other (vs wav2letter@anywhere&#x27;s 7.5%) with 590 ms theoretical latency using joint CTC-Transformer with parallel time-delayed LSTM and triggered attention. Check it out: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2001.02674" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2001.02674</a>
评论 #22054651 未加载
cproctorover 5 years ago
I&#x27;m about to start as a professor in CS education, and am hoping we&#x27;re getting close to the point where I can easily transcribe interviews and high-quality dialogue audio using open-sourced models running on machines in my lab. I&#x27;m tired of paying $1&#x2F;minute for human transcription that&#x27;s not great anyway, and would love to undertake research that would require processing a lot more audio than is affordable on those terms.<p>I haven&#x27;t kept up with developments over the last two years--anyone have a sense of whether this is close to being a reality?<p>(I&#x27;ve taken a bunch of Stanford&#x27;s graduate AI courses on NLP and speech recognition; I can read documentation and deploy&#x2F;configure models but don&#x27;t have much appetite for getting into the weeds.)
评论 #22040285 未加载
评论 #22040411 未加载
ColanRover 5 years ago
So what&#x27;s the efficiency of this model? Can I use it instead of pocketsphinx on a raspberry pi?
评论 #22037766 未加载
rexreedover 5 years ago
Given that this uses a beam search decoder to find the most likely word pattern, is it possible small perturbations in audio could cause it to improperly decode certain word strings? Sort of like the audio equivalent of adversarial attacks, but on ASR?
yellow_leadover 5 years ago
The name must be a nod to Word2Vec[1]. A cool naming scheme IMO.<p>[1] <a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Word2vec" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Word2vec</a>
评论 #22041587 未加载
评论 #22040700 未加载
starpilotover 5 years ago
Do the pretrained models work decently on landline phone quality recordings? I can see massive value for this if it can transcribe corporate call center audio.
评论 #22041224 未加载
z3t4over 5 years ago
For any project like this, please post exactly the sound configuration used for the model. eg. the rate (Hz), channels, and format.
amlutoover 5 years ago
I wonder if this would be a good engine to plug in to rhasspy.
phkahlerover 5 years ago
Which OSS license?
评论 #22039879 未加载