TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Audio pre-processing for Machine Learning: Getting things right

58 点作者 scarecrow112超过 5 年前

5 条评论

jacquesm超过 5 年前
This is a good starting point but it ends just when things get interesting. If you are going to process audio for ML make sure you experiment with normalizing the input volume, this can make a huge difference and try if your inputs are in stereo to process both mono, single channel and stereo inputs to see which one performs better.<p>Finally, if you pre-process the audio using an FFT try different FFT sizes.
评论 #22028872 未加载
评论 #22029394 未加载
Hydraulix989超过 5 年前
Careful! FFMPEG has an infectious license and the authors WILL publicly humiliate you on their Hall of Shame if you get caught misusing it by not open sourcing your whole application:<p><a href="https:&#x2F;&#x2F;ffmpeg.org&#x2F;shame.html" rel="nofollow">https:&#x2F;&#x2F;ffmpeg.org&#x2F;shame.html</a>
jkadlec超过 5 年前
Some good basic info, but at the same time there are some inaccuracies. WAV is not a lossless format, it&#x27;s a container, it can contain any compressed audio format, even mp3. You can have PCM inside WAV, which is indeed lossless, but you&#x27;re not going to see that in the wild too often. Going with 16k is also questionable, since most readily available pre-existing datasets, were recorded in 8k (which is what telephony codecs mostly use).
评论 #22028855 未加载
tsomctl超过 5 年前
I spent some time using the synchrosqueeze transform to preproccess audio files before feeding it into the network. Basically, you do a cwt, and then feed the output of that into the synchrosqueeze. It basically sharpens up the cwt, so that a signal isn&#x27;t spread out into so many bins. The output of the cwt is complex, and normally you through the imaginary data away. The synchrosqueeze transform uses the imaginary data to work it&#x27;s magic. Figures 5 and 6 of the below pdf are good examples.<p>I believe I based my code of this matlab code: <a href="https:&#x2F;&#x2F;github.com&#x2F;ebrevdo&#x2F;synchrosqueezing&#x2F;tree&#x2F;master&#x2F;synchrosqueezing" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ebrevdo&#x2F;synchrosqueezing&#x2F;tree&#x2F;master&#x2F;sync...</a><p>The above matlab code is ridiculously slow, I rewrote it using sse intrinsics, and got it several orders of magnitude faster.<p>I hope this helps out someone. I never really produced anything with it, but I still feel it is promising.<p><a href="https:&#x2F;&#x2F;services.math.duke.edu&#x2F;~jianfeng&#x2F;paper&#x2F;synsquez.pdf" rel="nofollow">https:&#x2F;&#x2F;services.math.duke.edu&#x2F;~jianfeng&#x2F;paper&#x2F;synsquez.pdf</a>
a-dub超过 5 年前
Use this: <a href="https:&#x2F;&#x2F;github.com&#x2F;librosa&#x2F;librosa" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;librosa&#x2F;librosa</a>