Music Source Separation in the Waveform Domain

11/27/2019
by   Alexandre Défossez, et al.
0

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments. Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we first show that an adaptation of Conv-Tasnet (Luo & Mesgarani, 2019), a waveform-to-waveform model for source separation for speech, significantly beats the state-of-the-art on the MusDB dataset, the standard benchmark of multi-instrument source separation. Second, we observe that Conv-Tasnet follows a masking approach on the input signal, which has the potential drawback of removing parts of the relevant source without the capacity to reconstruct it. We propose Demucs, a new waveform-to-waveform model, which has an architecture closer to models for audio generation with more capacity on the decoder. Experiments on the MusDB dataset show that Demucs beats previously reported results in terms of signal to distortion ratio (SDR), but lower than Conv-Tasnet. Human evaluations show that Demucs has significantly higher quality (as assessed by mean opinion score) than Conv-Tasnet, but slightly more contamination from other sources, which explains the difference in SDR. Additional experiments with a larger dataset suggest that the gap in SDR between Demucs and Conv-Tasnet shrinks, showing that our approach is promising.

READ FULL TEXT
research
10/29/2018

End-to-end music source separation: is it possible in the waveform domain?

Most of the currently successful source separation techniques use the ma...
research
09/03/2019

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

We study the problem of source separation for music using deep learning ...
research
12/07/2021

Danna-Sep: Unite to separate them all

Deep learning-based music source separation has gained a lot of interest...
research
08/26/2022

Music Separation Enhancement with Generative Modeling

Despite phenomenal progress in recent years, state-of-the-art music sepa...
research
06/01/2023

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

This paper introduces UnDiff, a diffusion probabilistic model capable of...
research
11/05/2021

Hybrid Spectrogram and Waveform Source Separation

Source separation models either work on the spectrogram or waveform doma...
research
07/15/2022

PodcastMix: A dataset for separating music and speech in podcasts

We introduce PodcastMix, a dataset formalizing the task of separating ba...

Please sign up or login with your details

Forgot password? Click here to reset