TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Mozilla Common Voice Dataset: More data, more languages

379 点作者 dabinat将近 5 年前

11 条评论

echelon将近 5 年前
Data in ML is critical, and this release from Mozilla is absolute gold for voice research.<p>This dataset and will help the many independent deep learning practitioners such as myself that aren&#x27;t working at FAANG and have only had access to datasets such as LJS [1] or self-constructed datasets that have been cobbled together and manually transcribed.<p>Despite the limited materials available, there&#x27;s already some truly amazing stuff being created. We&#x27;ve seen a lot of visually creative work being produced in the past few years, but the artistic community is only getting started with voice and sound.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=3qR8I5zlMHs" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=3qR8I5zlMHs</a><p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=L69gMxdvpUM" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=L69gMxdvpUM</a><p>Another really cool thing popping up are TTS systems trained from non-English speakers reading English corpuses. I&#x27;ve heard Angela Merkel reciting copypastas, and it&#x27;s quite amazing.<p>I&#x27;ve personally been dabbling in TTS as one of my &quot;pandemic side projects&quot; and found it to be quite fun and rewarding:<p><a href="https:&#x2F;&#x2F;trumped.com" rel="nofollow">https:&#x2F;&#x2F;trumped.com</a><p><a href="https:&#x2F;&#x2F;vo.codes" rel="nofollow">https:&#x2F;&#x2F;vo.codes</a><p>Besides TTS, one of the areas I think this data set will really help with is the domain of Voice Conversion (VC). It&#x27;ll be awesome to join Discord or TeamSpeak and talk in the voice of Gollum or Rick Sanchez. The VC field needs more data to perfect non-aligned training (where source and target speakers aren&#x27;t reciting the same training text that is temporally aligned), and this will be extremely helpful.<p>I think the future possibilities for ML techniques in art and media are nearly limitless. It&#x27;s truly an exciting frontier to watch rapidly evolve and to participate in.<p>[1] <a href="https:&#x2F;&#x2F;keithito.com&#x2F;LJ-Speech-Dataset&#x2F;" rel="nofollow">https:&#x2F;&#x2F;keithito.com&#x2F;LJ-Speech-Dataset&#x2F;</a>
评论 #23699323 未加载
评论 #23699449 未加载
lunixbochs将近 5 年前
This is great! I’m always excited to see new common voice releases.<p>As someone actively using the data I wish I could more easily see (and download lists for?) the older releases as there have been 3-4 dataset updates for English now. If we don’t have access to versioned datasets, there’s no way to reproduce old whitepapers or models that use common voice. And at this point I don’t remember the statistics (hours, accent&#x2F;gender breakdown) for each release. It would be neat to see that over time on the website.<p>I’m glad they’re working on single word recognition! This is something I’ve put significant effort into. It’s the biggest gap I’ve found in the existing public datasets - listening to someone read an audiobook or recite a sentence doesn’t seem to prepare the model very well for recognizing single words in isolation.<p>My model and training process have adapted for that, though I’m still not sure of the best way to balance training of that sort of thing. I have maybe 5 examples of each English word in isolation but 5000 examples of each number (Speech Commands), and it seems like the model will prefer e.g. “eight” over “ace”, I guess due to training balance.<p>Maybe I should be randomly sampling 50&#x2F;5000 of the imbalanced words each epoch so the model still has a chance to learn from them without overtraining?
评论 #23696415 未加载
jointpdf将近 5 年前
Does this dataset include people with voice or speech disorders (or other disabilities)? I don’t see any mention of it in this announcement or the forums, though I haven’t looked thoroughly (yet).<p>Examples: dysphonias of various kinds, dysarthria (e.g. from ALS &#x2F; cerebral palsy), vocal fold atrophy, stuttering, people with laryngectomies &#x2F; voice prosthesis, and many more.<p>Altogether, this represents millions of people for whom current speech recognition systems do not work well. This is an especially tragic situation, since people with disabilities depend more heavily on assistive technologies like ASR. Data&#x2F;ML bias is rightfully a hot topic lately, so I feel that the voices of people w&#x2F; disabilities need to be amplified as well (npi).
评论 #23698387 未加载
评论 #23697606 未加载
评论 #23698765 未加载
评论 #23698373 未加载
intopieces将近 5 年前
I would love to work for Mozilla on this effort full time. I have experience in voice data collection &#x2F; annotation &#x2F; processing at 2 FAANG companies. Anyone have an in? Thinking of reaching out to the person on who wrote this post directly.
评论 #23696340 未加载
评论 #23696461 未加载
评论 #23696100 未加载
评论 #23695811 未加载
Polylactic_acid将近 5 年前
Why on earth are they using mp3 for the dataset? Its absolutely ancient and probably the worst choice possible. Opus is widely used for voice because it gets flawless results at minuscule bitrates. And don&#x27;t tell me its because users find mp3 simpler because if you are doing machine learning I expect you know how to use an audio file.
评论 #23696952 未加载
dang将近 5 年前
If curious see also<p>2019 <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19270646" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19270646</a>
tumetab1将近 5 年前
To contribute <a href="https:&#x2F;&#x2F;voice.mozilla.org&#x2F;en&#x2F;speak" rel="nofollow">https:&#x2F;&#x2F;voice.mozilla.org&#x2F;en&#x2F;speak</a>
评论 #23701442 未加载
j45将近 5 年前
This is really encouraging to see. So nice to see languages that have more speakers than the most commonly translated languages.
stergro将近 5 年前
The complete project is very exciting, I hope that this is really a game changer, that enables private persons and startups to create new neural networks without a big investment for the data collection.<p>I worked for the Esperanto dataset of common voice in the last year, and we now have collected over 80 hours in Esperanto. I hope that in a year or two we&#x27;ll have collected enough data to create the first usable neural network for a constructed language and maybe the first voice assistant in Esperanto too. I will train a first experimental model with this release soon.
user764743将近 5 年前
This is interesting. As someone who has always tons of interview data to transcribe for academic research, what TTS systems should I be looking into to help me save some time? Is Deep Speech adapted for this use?
villgax将近 5 年前
Nice, now we need the CTC based models to run offline on low-powered devices &amp; then pretty much all speech-to-text APIs are done for.
评论 #23697693 未加载