I am working on a side project where I am able to combine the voice from one music video to the background music of another video.<p>i.e Taylor Swift's voice from Last Christmas to the BGM of Van Halen's jump etc.<p>It is just not working. Are there any techniques to programmatically add the bgm appropriately?<p>Right now, I am thinking of reducing the BGM audio while the vocals are being played.<p>What is the right direction I need to take?
If you use Spleeter you can isolate vocals and BGM. Maybe you are at that stage already. The problems you still need to solve, assuming you want them to blend aesthetically, is key and tempo. For this you will need to detect the key and tempo each part is in. Then you'll need to adjust the tempo and key from one or both parts to match. Even then you will still likely have a clash of structure where the verse and choruses don't match up or even worse structure dissimilarities. For this you need another neural net like Spleeter but trained to combine various structures of music, or of course you could do this part by hand.