TechEcho

17 comments

tingletechover 5 years ago

All I see is """Sorry, this content isn't available right now The link you followed may have expired, or the page may only be visible to an audience you're not in. Go back to the previous page · Go to News Feed · Visit our Help Center"""Edit: found a link that works <a href="https://github.com/facebookresearch/wav2letter" rel="nofollow">https://github.com/facebookresearch/wav2letter</a>

评论 #22042079 未加载

评论 #22041893 未加载

dvduvalover 5 years ago

So by open sourced I assume this means there are absolutely no Facebook dependencies where the voice is passing through a Facebook server? Sorry, have to ask, as my trust level is low. Otherwise, awesome!

评论 #22037314 未加载

评论 #22037335 未加载

评论 #22041913 未加载

gliese1337over 5 years ago

Online speech recognition for English.The framework should be generalizable, but the models they are making available are only for English. Actually adapting this for any other language would be a huge amount of additional work.

评论 #22040107 未加载

评论 #22038124 未加载

Jnrover 5 years ago

How does this compare to Mozilla's DeepSpeech?And does anyone know when Mozilla will release the updated Common Voice dataset from <a href="https://voice.mozilla.org" rel="nofollow">https://voice.mozilla.org</a> ?

评论 #22038188 未加载

评论 #22039722 未加载

jwineingerover 5 years ago

I'd love a tutorial that shows a normal guy like me how to use this tool with the pre-trianed models to transcribe my audio files. Not finding anything of that kind included there.

评论 #22038734 未加载

gokover 5 years ago

The preprint: <a href="https://research.fb.com/wp-content/uploads/2020/01/Scaling-up-online-speech-recognition-using-ConvNets.pdf" rel="nofollow">https://research.fb.com/wp-content/uploads/2020/01/Scaling-u...</a>Interestingly, the baselines are all systems that model grapheme forms instead of acoustic (phonemes) directly.

faitswulffover 5 years ago

Speaking as a Facebook user, I'm a bit confused - where do they use speech recognition? Or is this just purely research oriented?

评论 #22038457 未加载

评论 #22038146 未加载

评论 #22044765 未加载

评论 #22042990 未加载

评论 #22042082 未加载

评论 #22039839 未加载

isoosover 5 years ago

I'd be really interested in the accuracy of this tool to solve Google audio captchas. I'm assuming the price of solving captchas will go further down.

评论 #22040143 未加载

jonathanlerouxover 5 years ago

If I may insert a relevant plug: we (MERL) just put out a paper last week with SOTA 7.0 % WER on LibriSpeech test-other (vs wav2letter@anywhere's 7.5%) with 590 ms theoretical latency using joint CTC-Transformer with parallel time-delayed LSTM and triggered attention. Check it out: <a href="https://arxiv.org/abs/2001.02674" rel="nofollow">https://arxiv.org/abs/2001.02674</a>

评论 #22054651 未加载

cproctorover 5 years ago

I'm about to start as a professor in CS education, and am hoping we're getting close to the point where I can easily transcribe interviews and high-quality dialogue audio using open-sourced models running on machines in my lab. I'm tired of paying $1/minute for human transcription that's not great anyway, and would love to undertake research that would require processing a lot more audio than is affordable on those terms.I haven't kept up with developments over the last two years--anyone have a sense of whether this is close to being a reality?(I've taken a bunch of Stanford's graduate AI courses on NLP and speech recognition; I can read documentation and deploy/configure models but don't have much appetite for getting into the weeds.)

评论 #22040285 未加载

评论 #22040411 未加载

ColanRover 5 years ago

So what's the efficiency of this model? Can I use it instead of pocketsphinx on a raspberry pi?

评论 #22037766 未加载

rexreedover 5 years ago

Given that this uses a beam search decoder to find the most likely word pattern, is it possible small perturbations in audio could cause it to improperly decode certain word strings? Sort of like the audio equivalent of adversarial attacks, but on ASR?

yellow_leadover 5 years ago

The name must be a nod to Word2Vec[1]. A cool naming scheme IMO.[1] <a href="https://en.m.wikipedia.org/wiki/Word2vec" rel="nofollow">https://en.m.wikipedia.org/wiki/Word2vec</a>

评论 #22041587 未加载

评论 #22040700 未加载

starpilotover 5 years ago

Do the pretrained models work decently on landline phone quality recordings? I can see massive value for this if it can transcribe corporate call center audio.

评论 #22041224 未加载

z3t4over 5 years ago

For any project like this, please post exactly the sound configuration used for the model. eg. the rate (Hz), channels, and format.

amlutoover 5 years ago

I wonder if this would be a good engine to plug in to rhasspy.

phkahlerover 5 years ago

Which OSS license?

评论 #22039879 未加载

17 comments

tingletechover 5 years ago

评论 #22042079 未加载

评论 #22041893 未加载

dvduvalover 5 years ago

评论 #22037314 未加载

评论 #22037335 未加载

评论 #22041913 未加载

gliese1337over 5 years ago

评论 #22040107 未加载

评论 #22038124 未加载

Jnrover 5 years ago

评论 #22038188 未加载

评论 #22039722 未加载

jwineingerover 5 years ago

I'd love a tutorial that shows a normal guy like me how to use this tool with the pre-trianed models to transcribe my audio files. Not finding anything of that kind included there.

评论 #22038734 未加载

gokover 5 years ago

faitswulffover 5 years ago

Speaking as a Facebook user, I'm a bit confused - where do they use speech recognition? Or is this just purely research oriented?

评论 #22038457 未加载

评论 #22038146 未加载

评论 #22044765 未加载

评论 #22042990 未加载

评论 #22042082 未加载

评论 #22039839 未加载

isoosover 5 years ago

I'd be really interested in the accuracy of this tool to solve Google audio captchas. I'm assuming the price of solving captchas will go further down.

评论 #22040143 未加载

jonathanlerouxover 5 years ago

评论 #22054651 未加载

cproctorover 5 years ago

评论 #22040285 未加载

评论 #22040411 未加载

ColanRover 5 years ago

So what's the efficiency of this model? Can I use it instead of pocketsphinx on a raspberry pi?

评论 #22037766 未加载

rexreedover 5 years ago

yellow_leadover 5 years ago

The name must be a nod to Word2Vec[1]. A cool naming scheme IMO.[1] <a href="https://en.m.wikipedia.org/wiki/Word2vec" rel="nofollow">https://en.m.wikipedia.org/wiki/Word2vec</a>

评论 #22041587 未加载

评论 #22040700 未加载

starpilotover 5 years ago

Do the pretrained models work decently on landline phone quality recordings? I can see massive value for this if it can transcribe corporate call center audio.

评论 #22041224 未加载

z3t4over 5 years ago

For any project like this, please post exactly the sound configuration used for the model. eg. the rate (Hz), channels, and format.

amlutoover 5 years ago

I wonder if this would be a good engine to plug in to rhasspy.

phkahlerover 5 years ago

Which OSS license?

评论 #22039879 未加载

Online speech recognition with wav2letter anywhere

17 comments

Online speech recognition with wav2letter anywhere

17 comments