TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Audio Datasets for Machine Learning

55 pointsby TakakiTohnoabout 5 years ago

3 comments

beatle_sauceabout 5 years ago
IMHO the speech dataset list is missing other interesting free corpora, e.g. the TEDlium dataset, Voxforge, Common Voice. A more comprehensive (but not complete) list can be found here: <a href="https:&#x2F;&#x2F;github.com&#x2F;kaldi-asr&#x2F;kaldi&#x2F;tree&#x2F;master&#x2F;egs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kaldi-asr&#x2F;kaldi&#x2F;tree&#x2F;master&#x2F;egs</a> (download links can be found in the scripts)
sschmittabout 5 years ago
Also see the &quot;Heidelberg Spiking Datasets&quot;: <a href="https:&#x2F;&#x2F;ieee-dataport.org&#x2F;open-access&#x2F;heidelberg-spiking-datasets" rel="nofollow">https:&#x2F;&#x2F;ieee-dataport.org&#x2F;open-access&#x2F;heidelberg-spiking-dat...</a>
MintChocoisEwabout 5 years ago
Spoken Wikipedia corpus is especially impressive