This is an awesome project, but it seems it was done without reference to academic literature on source separation. In fact, people have been doing audio source separation for years with neural networks.<p>For instance, Eric Humphrey at Spotify Music Understanding Group describes using a U-Net architecture here: <a href="https://medium.com/this-week-in-machine-learning-ai/separating-vocals-in-recorded-music-at-spotify-with-eric-humphrey-51c2f85d1451" rel="nofollow">https://medium.com/this-week-in-machine-learning-ai/separati...</a> - paper at <a href="http://openaccess.city.ac.uk/19289/1/7bb8d1600fba70dd79408775cd0c37a4ff62.pdf" rel="nofollow">http://openaccess.city.ac.uk/19289/1/7bb8d1600fba70dd7940877...</a><p>They compare their performance to the widely-cited state of the art Chimera model (Luo 2017): <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5791533/#R24" rel="nofollow">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5791533/#R24</a> with examples at <a href="http://naplab.ee.columbia.edu/ivs.html" rel="nofollow">http://naplab.ee.columbia.edu/ivs.html</a> - from the examples, there's significantly less distortion than OP.<p>Not to discourage OP from doing first-principles research at all! But it's often useful to engage with the larger community and know what's succeeded and failed in the past. This is a problem domain where progress could change the entire creative landscape around derivative works ("mashups" and the like), and interested researchers could do well to look towards collaboration rather than reinventing each others' wheels.<p>EDIT: The SANE conference has talks by Humphrey and many others available online: <a href="https://www.youtube.com/channel/UCsdxfneC1EdPorDUq9_XUJA/videos" rel="nofollow">https://www.youtube.com/channel/UCsdxfneC1EdPorDUq9_XUJA/vid...</a>