Spleeter – Music Source-Separation Engine

258 点作者 jph98将近 5 年前

26 条评论

Once this technology gets incorporated into DJ mixers / CDJs, this is going to make DJing much more creatively interesting.Historically, blending between mixed stereo tracks has limited to mixing EQ bands, but now DJs will be able to layer and mix the underlying stems themselves -- like putting the vocal from one track onto an instrumental section on another (even if there were never a capella / instrumental versions released.)It also opens up a previously unreachable world for amateur remixing in general; for instance, creating surround sound mixes from stereo or even mono recordings for playback in 3D audio environments like Envelop (<a href="https://envelop.us" rel="nofollow">https://envelop.us</a>) [disclaimer: I am one of the co-founders of Envelop]

评论 #23233213 未加载

评论 #23233122 未加载

SyneRyder将近 5 年前

For anyone who wants to try Spleeter in a version that "just works" without having to install TensorFlow and mess with offline processing, Spleeter has been been built into a wave editor called Acoustica from Acon Digital. It's been working really well for me, and the whole package is solid competition to editors like iZotope RX:<a href="https://acondigital.com/products/acoustica-audio-editor/" rel="nofollow">https://acondigital.com/products/acoustica-audio-editor/</a>

评论 #23238072 未加载

评论 #23233644 未加载

mwcampbell将近 5 年前

Previous discussion, where I posted a demo using a full song (legally under Creative Commons):<a href="https://news.ycombinator.com/item?id=21431071" rel="nofollow">https://news.ycombinator.com/item?id=21431071</a>Note: I'm not affiliated with this project; I just think it's cool.

评论 #23231531 未加载

svat将近 5 年前

I often have voice recordings with a lot of background noise (e.g. a public lecture in a room with poor acoustics, recorded from a phone in the audience — there's usually sounds of paper rustling, noises from the street, etc). Is this "source-separation" the sort of thing that could help, or does anyone have other tips? The best thing I have so far is based on this <a href="https://wiki.audacityteam.org/wiki/Sanitizing_speech_recordings_made_with_portable_audio_recorders#A_simple_two-step_process_taking_a_minute" rel="nofollow">https://wiki.audacityteam.org/wiki/Sanitizing_speech_recordi...</a> —(1) Open the file in Audacity and switch to Spectrogram view, (2) set a high-pass filter with ~150 Hz, i.e. filter out frequencies lower than that (which tend to be loud anyway), (3) don’t remove the higher frequencies (which aren’t loud), because they are what make the consonants understandable (apparently), (4) look for specific noises, select the rectangle, and use “Spectral Edit Multi Tool”.But if machine learning can help that would be really interesting! This Spleeter page does mention “active listening, educational purposes, […] transcription” so I'm excited.

评论 #23233483 未加载

评论 #23232723 未加载

评论 #23235241 未加载

iseanstevens将近 5 年前

There is a Max/Ableton live plugin version here, which makes it much easier to experiment with Spleeter artistically.<a href="https://github.com/diracdeltas/spleeter4max/releases/" rel="nofollow">https://github.com/diracdeltas/spleeter4max/releases/</a>

评论 #23231990 未加载

评论 #23232852 未加载

tomduncalf将近 5 年前

Another recent open source contender for source separation is Open Unmix: <a href="https://github.com/sigsep/open-unmix-pytorch/" rel="nofollow">https://github.com/sigsep/open-unmix-pytorch/</a>I’ve not had time to try it yet but have read good things.

评论 #23235468 未加载

评论 #23245206 未加载

voiper1将近 5 年前

Very cool!I was even able to run it on their notebook <a href="https://colab.research.google.com/github/deezer/spleeter/blob/master/spleeter.ipynb" rel="nofollow">https://colab.research.google.com/github/deezer/spleeter/blo...</a> without setting anything up locally.The results of vocal separation were quite impressive.

评论 #23233061 未加载

leoncvlt将近 5 年前

Here's the sample output, for those who are curious:- Sample track: <a href="https://files.catbox.moe/56op27.mp3" rel="nofollow">https://files.catbox.moe/56op27.mp3</a>- Spleeted vocals: <a href="https://files.catbox.moe/4d9aru.wav" rel="nofollow">https://files.catbox.moe/4d9aru.wav</a>- Spleeted accompaniment: <a href="https://files.catbox.moe/y67g23.wav" rel="nofollow">https://files.catbox.moe/y67g23.wav</a>

Myce将近 5 年前

A local radiostation has a broadcast of four hours. They are required to play an x amount of music tracks by the station (about 6 per hour), but there has been demand to make the broadcast available as podcast without the music.Could this make it possible to automatically remove the music from the MP3 file they have available? With 6 tracks per hour times 4 hours, manually removing the music is time consuming.I doubt it, as it seems all vocals are are output to a single file...Is there any other tool someone can recommend?

评论 #23234374 未加载

评论 #23231389 未加载

评论 #23232057 未加载

jph98将近 5 年前

Leveraging a state-of-the-art source separation algorithm for music information retrieval<a href="https://www.youtube.com/watch?time_continue=42&v=JIR6HJISrtY&feature=emb_logo" rel="nofollow">https://www.youtube.com/watch?time_continue=42&v=JIR6HJISrtY...</a>

TedDoesntTalk将近 5 年前

Now we can create all-star bands that never existed. For example:Neil schon from journey. Lead guitarHeart sisters doing lead vocals and lead/rthyum guitarFlea -- bass guitar from Chili PeppersNeal Peart -- drummer from rushTony kay --- keys from genesisThe only difficulty is they must all be playing the same song. Then we can extract, transpose if needed, and remix together.

评论 #23231680 未加载

评论 #23232059 未加载

grawprog将近 5 年前

I couldn't find any examples so was wondering for anyone that's tried this are the results better than using a bandpass filter and an equalizer to isolate frequencies or one of those auto karaoke things?Because the ability to separate any song into separate tracks would be amazing. The ability to remix any song or just play with any instrument or vocal track would be awesome. But does it have the same poor quality and limitations of most frequency based source separation?

评论 #23231984 未加载

marksomnian将近 5 年前

Had a play with the Colab and it's quite good indeed. The authors claim "100x real time speed", which is mighty impressive, but I'd be more interested in seeing a "Try Really Hard" mode, trading off quality and speed. Is that a thing that can be done in the current code, I wonder?

mehrdadn将近 5 年前

If you're trying to run it on Windows with Python 3.8, add numpy and cython to the dependencies, and change Tensorflow's requirement to be >= rather than ==.Though then you'll run into compatibility errors like "No module named 'tensorflow.contrib'" which you'll have to fix.

mbushey将近 5 年前

While this is awesome, it's trained on MUSDB18-HQ which as far as I can tell is proprietary. zenodo.org claims it is available, however I have filled out their "request access" page a half-dozen times. Does anyone know of a training data-set that's possible to obtain?Here is the zenodo response:Your access request has been rejected by the record owner.Message from owner: no justification givenRecord: MUSDB18-HQ - an uncompressed version of MUSDB18 <a href="https://zenodo.org/record/3338373" rel="nofollow">https://zenodo.org/record/3338373</a>The decision to reject the request is solely under the responsibility of the record owner. Hence, please note that Zenodo staff are not involved in this decision.

pabs3将近 5 年前

This reminds me of this open source project (and its predecessor manyears and open hardware projects 8/16soundsusb).<a href="https://github.com/introlab/odas" rel="nofollow">https://github.com/introlab/odas</a> <a href="https://github.com/introlab/manyears" rel="nofollow">https://github.com/introlab/manyears</a> <a href="https://github.com/introlab/16SoundsUSB" rel="nofollow">https://github.com/introlab/16SoundsUSB</a>Website of the team behind these:<a href="https://introlab.3it.usherbrooke.ca/" rel="nofollow">https://introlab.3it.usherbrooke.ca/</a>

TheOtherHobbes将近 5 年前

Out of interest, and to put this in context - your brain can only do this for conversation, not music.You routinely suppress background noise and room acoustics when listening to someone speaking. But you don't do the same thing when listening to music. At best you can focus on individual elements in a track, and you can parse them musically (and maybe lyrically).But you don't suppress the rest to the point where you don't hear it.

评论 #23238486 未加载

评论 #23233737 未加载

fold_left将近 5 年前

Once you have obtained just the Guitar from a track, are there any tools out there which can work out the Tablature (eg. <a href="https://www.ultimate-guitar.com//top/tabs" rel="nofollow">https://www.ultimate-guitar.com//top/tabs</a>) so you can play along?

评论 #23231544 未加载

InstaHeads将近 5 年前

Well, it seems neural networks started to appear for vocal and instrumental track isolation^^ recently I've discovered <a href="https://www.lalal.ai" rel="nofollow">https://www.lalal.ai</a> and it works quite well

philipov将近 5 年前

I tried using the 2 stem model to remove the music from an audio recording of two people talking. It kept sucking in some of the music whenever someone started talking, however. Is there a better model to use for that?

FraKtus将近 5 年前

It says it can be 100 times faster than in real-time.So can it be run in real-time?I am thinking about extracting features for music visualization but it could make a DJ happy also.

评论 #23232125 未加载

评论 #23232691 未加载

评论 #23236181 未加载

manceraio将近 5 年前

You could try spleeter on the cloud here <a href="https://voxremover.com" rel="nofollow">https://voxremover.com</a>

philipov将近 5 年前

The output appears to cut off after 10 minutes. How do you make it operate on longer files, like in the 100 minute range?

jbverschoor将近 5 年前

Deezer is pretty useless if all supported hardware require your phone to stream.They should spend dev time on something that matters

peterhookgen将近 5 年前

This is very cool, I have started using it for experimenting creating hardstyle dance remixes of popular songs

fit2rule将近 5 年前

This is ultra-cool .. I have a few terabytes of jam-session recordings that I'm going to throw at this. If it ends up being usable to the point that I can re-do vocals over some of the greatest moments in the archive, I'll be praising whatever Spleeter deity makes itself visible to me at the time, most highly ..

26 条评论

roddylindsay将近 5 年前

评论 #23233213 未加载

评论 #23233122 未加载

SyneRyder将近 5 年前

评论 #23238072 未加载

评论 #23233644 未加载

mwcampbell将近 5 年前

评论 #23231531 未加载

svat将近 5 年前

评论 #23233483 未加载

评论 #23232723 未加载

评论 #23235241 未加载

iseanstevens将近 5 年前

评论 #23231990 未加载

评论 #23232852 未加载

tomduncalf将近 5 年前

评论 #23235468 未加载

评论 #23245206 未加载

voiper1将近 5 年前

评论 #23233061 未加载

leoncvlt将近 5 年前

Myce将近 5 年前

评论 #23234374 未加载

评论 #23231389 未加载

评论 #23232057 未加载

jph98将近 5 年前

TedDoesntTalk将近 5 年前

评论 #23231680 未加载

评论 #23232059 未加载

grawprog将近 5 年前

评论 #23231984 未加载

marksomnian将近 5 年前

mehrdadn将近 5 年前

mbushey将近 5 年前

pabs3将近 5 年前

TheOtherHobbes将近 5 年前

评论 #23238486 未加载

评论 #23233737 未加载

fold_left将近 5 年前

评论 #23231544 未加载

InstaHeads将近 5 年前

philipov将近 5 年前

FraKtus将近 5 年前

It says it can be 100 times faster than in real-time.So can it be run in real-time?I am thinking about extracting features for music visualization but it could make a DJ happy also.

评论 #23232125 未加载

评论 #23232691 未加载

评论 #23236181 未加载

manceraio将近 5 年前

You could try spleeter on the cloud here <a href="https://voxremover.com" rel="nofollow">https://voxremover.com</a>

philipov将近 5 年前

The output appears to cut off after 10 minutes. How do you make it operate on longer files, like in the 100 minute range?

jbverschoor将近 5 年前

Deezer is pretty useless if all supported hardware require your phone to stream.They should spend dev time on something that matters

peterhookgen将近 5 年前

This is very cool, I have started using it for experimenting creating hardstyle dance remixes of popular songs

fit2rule将近 5 年前