What are some good learning resources on audio processing, detection and anomaly detection using machine learning or deep learning? I am interested in machine predictive maintenance using audio anomaly detection
There's a good class at UIUC regarding signal processing:<p><a href="https://courses.engr.illinois.edu/cs598ps/fa2018/material.html" rel="nofollow">https://courses.engr.illinois.edu/cs598ps/fa2018/material.ht...</a><p>Course is led by Paris Smaragdis, one of top researchers in the field of audio processing.
The folks behind audio set have been working on general audio event detection for some years now, I believe.<p><a href="https://research.google.com/audioset/" rel="nofollow">https://research.google.com/audioset/</a><p>There's a huge amount to discuss in the audio domain... But for a starting place, using ResNet on spectrograms to build a binary classifier is a good place to start.
I am taking a course called "Speech and Audio Understanding" from Prof. Michael I Mandel, you can check course website[1] , he has a good collection of resources. Also his github stars are good collection of related projects[2]. In class we are using a book called "Human and Machine Hearing: Extracting Meaning from Sound" by Richard F. Lyon, authors shares it for free [3]
For example one of the resources you will see on the course website is presentations from interspeech2018, you can check all tutorials from there[4].<p>[1] <a href="http://mr-pc.org/t/csc83060/" rel="nofollow">http://mr-pc.org/t/csc83060/</a><p>[2] <a href="https://github.com/mim?tab=stars" rel="nofollow">https://github.com/mim?tab=stars</a><p>[3] <a href="http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf" rel="nofollow">http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf</a><p>[4] <a href="http://interspeech2018.org/program-tutorials.html" rel="nofollow">http://interspeech2018.org/program-tutorials.html</a>
Just found this thread on the fast.ai forum yesterday that may help: <a href="https://forums.fast.ai/t/deep-learning-with-audio-thread/38123" rel="nofollow">https://forums.fast.ai/t/deep-learning-with-audio-thread/381...</a>
I don't know if this is off topic but would it be possible to remove the sound of mechanical keyboards with ML in realtime from a VOIP stream? Sell the technology to Discord and profit.
You may reuse some concepts I have described for an audio adblock: <a href="https://www.adblockradio.com/blog/2018/11/15/designing-audio-ad-block-radio-podcast/" rel="nofollow">https://www.adblockradio.com/blog/2018/11/15/designing-audio...</a><p>More precisely, audio spectral preprocessing then neural network such as LSTM.
I think the slides/recording of this excellent Spotify talk will be posted shortly: <a href="https://qcon.ai/qconai2019/presentation/deep-learning-audio-signals-prepare-process-design-expect" rel="nofollow">https://qcon.ai/qconai2019/presentation/deep-learning-audio-...</a>.
aubio and librosa are two excellent MIR (music information retrieval) tools I can recommend from personal use. They can both be implemented for real-time audio using pyaudio or similar.<p><a href="https://aubio.org/doc/latest/" rel="nofollow">https://aubio.org/doc/latest/</a><p><a href="https://librosa.github.io/librosa/" rel="nofollow">https://librosa.github.io/librosa/</a>
I am also curious about this topic!
I have picked up a jetson nano and fully intend to put this device to use by projecting comic-book panel-style speech bubbles (plus, who knows... random panels?) on the wall leveraging pytorch + deepspeech.<p>That's at least the idea kicking around in my head at the moment.
<a href="https://github.com/SeanNaren/deepspeech.pytorch" rel="nofollow">https://github.com/SeanNaren/deepspeech.pytorch</a><p>I'm no expert. Haven't done it. Don't really want to send every convo into the cloud or my tinfoil hat will start burning.<p>You do not need a jetson to get started investigating. Maybe just nvidia for that particular library.
If you find something, maybe you can let me know somehow.<p>Peace
<a href="https://github.com/ybayle/awesome-deep-learning-music" rel="nofollow">https://github.com/ybayle/awesome-deep-learning-music</a> a "Non-exhaustive list of scientific articles on deep learning for music"
Here's a resource that breaks down the various audio processing tasks and provides case studies:
<a href="https://www.analyticsvidhya.com/blog/2018/01/10-audio-processing-projects-applications/" rel="nofollow">https://www.analyticsvidhya.com/blog/2018/01/10-audio-proces...</a><p>It's slightly academic so here's a more practical resource:
<a href="https://towardsdatascience.com/audio-classification-using-fastai-and-on-the-fly-frequency-transforms-4dbe1b540f89" rel="nofollow">https://towardsdatascience.com/audio-classification-using-fa...</a>
I would get lunch with these guys:<p><a href="https://www.audiblemagic.com/" rel="nofollow">https://www.audiblemagic.com/</a><p>These sketch balls can use your phone's mic to detect what is streaming in a living room.
Recently I started looking in to this as a backup method of anomaly detection while performing automated testing of our robotics. I concluded that it's actually pretty easy. Depending upon how simplistic your requirements, you can even achieve this cheaply and effectively on a very tiny microprocessor with an attached surface mount MEMS microphone. Additional features like anomalous audio recording, timestamping and alert transmission are not that hard either. No need for a fully-fledged general purpose operating system, or complex algorithms.
See this book and the sources it links to: <a href="https://musicinformationretrieval.com/" rel="nofollow">https://musicinformationretrieval.com/</a> Also google for pitch and onset detection. If you want more specific help, you have to ask a more specific question.
There are many great resources to reference here:<p><a href="https://www.science.wiki/search?keyword=audio+processing" rel="nofollow">https://www.science.wiki/search?keyword=audio+processing</a>
Contact the founder / maker of Auphonic.com - he's a super nice and clever guy who does this kind of stuff for a living. He'll definitely point you into the right direction.
This depends if you're interested in creative applications or analytical (MIR) ones. The two fields share a lot of techniques, but the way they are used is wildly different.