I'm a big fan of Lex Friedman. I've listen to most of his podcasts. But recently, he's talking to much before thinking, without adding any value to the conversation or pushing his ideas that are, at least, very controversial. Nothing against him personally. But I know his opinions on things already. I don't want listen him preaching them over and over.<p>I noticed that his comments add so little to the conversation, that if I could trim his voice out of the podcast, that would increase the quality of it.<p>I thought there would be some automated way of doing it using ML. I have some experience with CNN on images, but I've never dealt with audio before. Any recommendations?
What about classifying who is speaking and just muting audio when Lex is speaking? First, extract a lot of samples (e.g 2-5 seconds of Audio) from a lot of his podcasts and label them as 1/0 or Lex/Other person speaking.<p>Take those samples and convert them to a frequency spectrum. For each sample, average (or use max, min, whatever) the values over the time sample. Take bins of values (e.g. 100hz, 120 hz, 140hz), and filter out all values outside of the human speaking range.<p>What you then have is a training set that is a set of features that are the amplitude of each frequency, and a target of 1 (Lex is speaking) or 0 (Somebody else is speaking).<p>Use your ML or Deep Learning Algo of choice to see if you can get useful results out of it.