Ask HN: Machine learning resources for audio processing

252 pointsby samrohnabout 6 years ago

What are some good learning resources on audio processing, detection and anomaly detection using machine learning or deep learning? I am interested in machine predictive maintenance using audio anomaly detection

19 comments

citilifeabout 6 years ago

There's a good class at UIUC regarding signal processing:<a href="https://courses.engr.illinois.edu/cs598ps/fa2018/material.html" rel="nofollow">https://courses.engr.illinois.edu/cs598ps/fa2018/material.ht...</a>Course is led by Paris Smaragdis, one of top researchers in the field of audio processing.

sdenton4about 6 years ago

The folks behind audio set have been working on general audio event detection for some years now, I believe.<a href="https://research.google.com/audioset/" rel="nofollow">https://research.google.com/audioset/</a>There's a huge amount to discuss in the audio domain... But for a starting place, using ResNet on spectrograms to build a binary classifier is a good place to start.

enisberkabout 6 years ago

I am taking a course called "Speech and Audio Understanding" from Prof. Michael I Mandel, you can check course website[1] , he has a good collection of resources. Also his github stars are good collection of related projects[2]. In class we are using a book called "Human and Machine Hearing: Extracting Meaning from Sound" by Richard F. Lyon, authors shares it for free [3] For example one of the resources you will see on the course website is presentations from interspeech2018, you can check all tutorials from there[4].[1] <a href="http://mr-pc.org/t/csc83060/" rel="nofollow">http://mr-pc.org/t/csc83060/</a>[2] <a href="https://github.com/mim?tab=stars" rel="nofollow">https://github.com/mim?tab=stars</a>[3] <a href="http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf" rel="nofollow">http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf</a>[4] <a href="http://interspeech2018.org/program-tutorials.html" rel="nofollow">http://interspeech2018.org/program-tutorials.html</a>

am807about 6 years ago

Just found this thread on the fast.ai forum yesterday that may help: <a href="https://forums.fast.ai/t/deep-learning-with-audio-thread/38123" rel="nofollow">https://forums.fast.ai/t/deep-learning-with-audio-thread/381...</a>

评论 #19693063 未加载

Tangokatabout 6 years ago

I don't know if this is off topic but would it be possible to remove the sound of mechanical keyboards with ML in realtime from a VOIP stream? Sell the technology to Discord and profit.

评论 #19694552 未加载

destabout 6 years ago

You may reuse some concepts I have described for an audio adblock: <a href="https://www.adblockradio.com/blog/2018/11/15/designing-audio-ad-block-radio-podcast/" rel="nofollow">https://www.adblockradio.com/blog/2018/11/15/designing-audio...</a>More precisely, audio spectral preprocessing then neural network such as LSTM.

williamsmjabout 6 years ago

I think the slides/recording of this excellent Spotify talk will be posted shortly: <a href="https://qcon.ai/qconai2019/presentation/deep-learning-audio-signals-prepare-process-design-expect" rel="nofollow">https://qcon.ai/qconai2019/presentation/deep-learning-audio-...</a>.

telesillaabout 6 years ago

aubio and librosa are two excellent MIR (music information retrieval) tools I can recommend from personal use. They can both be implemented for real-time audio using pyaudio or similar.<a href="https://aubio.org/doc/latest/" rel="nofollow">https://aubio.org/doc/latest/</a><a href="https://librosa.github.io/librosa/" rel="nofollow">https://librosa.github.io/librosa/</a>

评论 #19693203 未加载

konsoleXDabout 6 years ago

I am also curious about this topic! I have picked up a jetson nano and fully intend to put this device to use by projecting comic-book panel-style speech bubbles (plus, who knows... random panels?) on the wall leveraging pytorch + deepspeech.That's at least the idea kicking around in my head at the moment. <a href="https://github.com/SeanNaren/deepspeech.pytorch" rel="nofollow">https://github.com/SeanNaren/deepspeech.pytorch</a>I'm no expert. Haven't done it. Don't really want to send every convo into the cloud or my tinfoil hat will start burning.You do not need a jetson to get started investigating. Maybe just nvidia for that particular library. If you find something, maybe you can let me know somehow.Peace

devinabout 6 years ago

<a href="https://github.com/ybayle/awesome-deep-learning-music" rel="nofollow">https://github.com/ybayle/awesome-deep-learning-music</a> a "Non-exhaustive list of scientific articles on deep learning for music"

tixocloudabout 6 years ago

Here's a resource that breaks down the various audio processing tasks and provides case studies: <a href="https://www.analyticsvidhya.com/blog/2018/01/10-audio-processing-projects-applications/" rel="nofollow">https://www.analyticsvidhya.com/blog/2018/01/10-audio-proces...</a>It's slightly academic so here's a more practical resource: <a href="https://towardsdatascience.com/audio-classification-using-fastai-and-on-the-fly-frequency-transforms-4dbe1b540f89" rel="nofollow">https://towardsdatascience.com/audio-classification-using-fa...</a>

ransom1538about 6 years ago

I would get lunch with these guys:<a href="https://www.audiblemagic.com/" rel="nofollow">https://www.audiblemagic.com/</a>These sketch balls can use your phone's mic to detect what is streaming in a living room.

contingenciesabout 6 years ago

Recently I started looking in to this as a backup method of anomaly detection while performing automated testing of our robotics. I concluded that it's actually pretty easy. Depending upon how simplistic your requirements, you can even achieve this cheaply and effectively on a very tiny microprocessor with an attached surface mount MEMS microphone. Additional features like anomalous audio recording, timestamping and alert transmission are not that hard either. No need for a fully-fledged general purpose operating system, or complex algorithms.

bjourneabout 6 years ago

See this book and the sources it links to: <a href="https://musicinformationretrieval.com/" rel="nofollow">https://musicinformationretrieval.com/</a> Also google for pitch and onset detection. If you want more specific help, you have to ask a more specific question.

评论 #19691809 未加载

ml-engineerabout 6 years ago

There are many great resources to reference here:<a href="https://www.science.wiki/search?keyword=audio+processing" rel="nofollow">https://www.science.wiki/search?keyword=audio+processing</a>

iagooarabout 6 years ago

Contact the founder / maker of Auphonic.com - he's a super nice and clever guy who does this kind of stuff for a living. He'll definitely point you into the right direction.

jamesb93about 6 years ago

This depends if you're interested in creative applications or analytical (MIR) ones. The two fields share a lot of techniques, but the way they are used is wildly different.

preetiagarwalabout 6 years ago

thanks for sharing article <a href="https://www.exltech.in/mechanical-design-training.html" rel="nofollow">https://www.exltech.in/mechanical-design-training.html</a>

xylophoneabout 6 years ago

piston aircraft?