The terminology is a bit confusing. They are saying that they want to build voice recognition but it seems like they actually might want to build a speech recognition engine. Speech recognition is about recognizing the speech, the spoken words. Voice recognition is about recognizing the speakers voice, i.e. identifying the speaker. Also, maybe they also want to build a text-to-speech (TTS) system but I'm not sure.<p>No matter what, the collected data might be useful for all of that, maybe except of voice recognition actually, because I guess the data will be collected anonymously?<p>Note that there are some other existing big open speech corpora such as LibriSpeech (<a href="http://www.openslr.org/12/" rel="nofollow">http://www.openslr.org/12/</a>) which could already be used right now to build a quite good speech recognition system.
This looks great!
I use voice control to program on occasion due to an rsi injury. The standard stack for this is a mess due to closed source systems that aren't designed for voice programmers.
A good open solution could really save me from a lot of headaches.
If they're planning to make a voice recognition system, why are they using example statements that are clearly taken from novels? [0] That's not how real people talk. They use a lot more slang, a lot more stopping and starting, filler words, etc. Instead you have people saying things like "irresolute", "rumbling", and other complex words. It would be useful for training a novel dictation system, but it's not how people would speak to their browser for example.<p>[0]: An example sentence is "a thin circle of bright metal showed between the top and the bottom of the body of the cylinder", which is from H. G. Wells' <i>War of the Worlds</i>.
And... 503'd. I didn't catch what the intended use case was before it died, but I'm guessing computer generated voice?<p>Most of the computer generated stuff I've seen uses trained actors. Which neatly avoids the problem of trying to reconcile a myriad of accents and dialects, which was immediately apparent from the first two samples I tried.<p>edit: back up, seems to be about voice recognition, which this could help with no problem.
It would be useful to collect data from non-native speakers of a language. More and more such individuals are appearing in all countries, and devices that accept spoken words should not break because of someone's level of command of a spoken language. For example, a Swiss speaking German (Hochdeutsch), or more clearly, a Brit speaking French, etc. Some children who grow up in multi-lingual families also intermix words from multiple languages into their sentences. We can still understand them.
I wonder if implementing a new type of Recaptcha with these type of projects in mind would make sense. The data wouldn't be going to some data center in Google land, but instead to some open end that anyone should be able to get their hands on. Also a free and open source recaptcha alternative would be nice. Trick is keeping it complex enough that bots cannot just reuse the existing public data set. Maybe withhold on making some of the data public for a few years till deemed 'retired'.
I hope this data will be used purely for voice recognition purposes and not for voice generation, or we'll be stuck with robots talking with this horrible gurgling and clicking accent due to poor recording conditions of most participants!
Cool project, really aligned with the mission of Mozilla, and with a pleasant UX. And if you're a non-english speaker like me validating sentences is a nice way of improving your comprehension.
this is an important development. voice control has good potential. would be cool if they used it as an alternative way to control firefox and/or servo?
Any idea why the duplicate detection did not work for this link:
<a href="https://news.ycombinator.com/item?id=14786881" rel="nofollow">https://news.ycombinator.com/item?id=14786881</a><p>Anyhow: these should be merged (even though there is no discussion on the other submission)