TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Applying machine learning and deep learning methods to audio analysis

93 pointsby gidimover 5 years ago

4 comments

jononorover 5 years ago
As an introduction introduction I guess this is OK. However there are two major limitations:<p>1: The feature extraction ends with mean-summarizing across the entire audio clip - leaving no temporal information. This only works well for simple tasks. At least mentioning something about analysis windows and temporal modelling would be good, as the natural next step. Be it LSTM&#x2F;GRU on the MFCC, or CNN on mel-spectrogram.<p>2: The folds of the Urbansound8k dataset are not respected in the evaluation. In Urbansound8k different folds contains clips extracted from the same original audio files, usually very close in time. So mixing the folds for the testset means it is no longer entirely &quot;unseen data&quot;. The model very likely exploits this data leakage, as the reported accuracy is above SOTA (for no data-augmentation) - unreasonable given the low fidelity feature representation. At least mentioning this limitation and that the performance number they give cannot be compared with other methods, would be prudent.<p>When I commented similarly on r&#x2F;machinelearning the authors acknowledged these weaknesses, but did not update the article to reflect it.
评论 #21647094 未加载
jononorover 5 years ago
Warning: shameless-self-promotion. For those that wish to go a bit beyond this article, I gave a presentation on the topic at EuroPython. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=uCGROOUO_wY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=uCGROOUO_wY</a> It explains how to build models that can make use of temporal variations and learn the feature representations based on the (Mel) spectrogram. Especially suited if you are already familiar with image-classification using Convolutional Neural Networks.
m0zgover 5 years ago
As one of the long-suffering Comet.ml customers, I wish they&#x27;d spend more time working on their site&#x27;s performance and less on writing blog posts. It takes multiple seconds for graphs to render, and leaving any part of Comet.ml UI open in the browser leads to spinning fans and quick battery drain when working from a laptop. The logging component will sometimes hang without a warning and hang your training session as well. Bizarrely, there&#x27;s no way to show min&#x2F;max metric values for ongoing and completed runs (AKA the only thing a researcher actually cares about): you have to log them separately in order to display them.<p>This is a weird field: these are not difficult problems to solve, yet as far as I can tell, all of the popular choices available so far each suck in their own unique way and there&#x27;s no option that I know of that actually offers convenience and high performance. FOSS options are barely existent, as well, and they also suck.<p>For the things where Comet.ml would be too onerous to deal with, I still use pen and paper.
评论 #21647066 未加载
评论 #21648437 未加载
评论 #21647083 未加载
评论 #21649697 未加载
syntaxingover 5 years ago
Is there a method to detect a specific word and tell me the timestamp throughout an audio sample easily? I&#x27;ve been trying to implement something like this but wasn&#x27;t sure how to approach it.
评论 #21647024 未加载
评论 #21647101 未加载
评论 #21646956 未加载