科技回声

4 条评论

jononor超过 5 年前

As an introduction introduction I guess this is OK. However there are two major limitations:1: The feature extraction ends with mean-summarizing across the entire audio clip - leaving no temporal information. This only works well for simple tasks. At least mentioning something about analysis windows and temporal modelling would be good, as the natural next step. Be it LSTM/GRU on the MFCC, or CNN on mel-spectrogram.2: The folds of the Urbansound8k dataset are not respected in the evaluation. In Urbansound8k different folds contains clips extracted from the same original audio files, usually very close in time. So mixing the folds for the testset means it is no longer entirely "unseen data". The model very likely exploits this data leakage, as the reported accuracy is above SOTA (for no data-augmentation) - unreasonable given the low fidelity feature representation. At least mentioning this limitation and that the performance number they give cannot be compared with other methods, would be prudent.When I commented similarly on r/machinelearning the authors acknowledged these weaknesses, but did not update the article to reflect it.

评论 #21647094 未加载

jononor超过 5 年前

Warning: shameless-self-promotion. For those that wish to go a bit beyond this article, I gave a presentation on the topic at EuroPython. <a href="https://www.youtube.com/watch?v=uCGROOUO_wY" rel="nofollow">https://www.youtube.com/watch?v=uCGROOUO_wY</a> It explains how to build models that can make use of temporal variations and learn the feature representations based on the (Mel) spectrogram. Especially suited if you are already familiar with image-classification using Convolutional Neural Networks.

m0zg超过 5 年前

As one of the long-suffering Comet.ml customers, I wish they'd spend more time working on their site's performance and less on writing blog posts. It takes multiple seconds for graphs to render, and leaving any part of Comet.ml UI open in the browser leads to spinning fans and quick battery drain when working from a laptop. The logging component will sometimes hang without a warning and hang your training session as well. Bizarrely, there's no way to show min/max metric values for ongoing and completed runs (AKA the only thing a researcher actually cares about): you have to log them separately in order to display them.This is a weird field: these are not difficult problems to solve, yet as far as I can tell, all of the popular choices available so far each suck in their own unique way and there's no option that I know of that actually offers convenience and high performance. FOSS options are barely existent, as well, and they also suck.For the things where Comet.ml would be too onerous to deal with, I still use pen and paper.

评论 #21647066 未加载

评论 #21648437 未加载

评论 #21647083 未加载

评论 #21649697 未加载

syntaxing超过 5 年前

Is there a method to detect a specific word and tell me the timestamp throughout an audio sample easily? I've been trying to implement something like this but wasn't sure how to approach it.

评论 #21647024 未加载

评论 #21647101 未加载

评论 #21646956 未加载

4 条评论

jononor超过 5 年前

评论 #21647094 未加载

jononor超过 5 年前

m0zg超过 5 年前

评论 #21647066 未加载

评论 #21648437 未加载

评论 #21647083 未加载

评论 #21649697 未加载

syntaxing超过 5 年前

Is there a method to detect a specific word and tell me the timestamp throughout an audio sample easily? I've been trying to implement something like this but wasn't sure how to approach it.

评论 #21647024 未加载

评论 #21647101 未加载

评论 #21646956 未加载

Applying machine learning and deep learning methods to audio analysis

4 条评论

Applying machine learning and deep learning methods to audio analysis

4 条评论