Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

06/08/2018
by   Daniel Stoller, et al.
0

Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, employing a long temporal input context on the sample level is difficult, but required for high quality separation results because of long-range temporal correlations. In this context, we propose the Wave-U-Net, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales. We introduce further architectural improvements, including an output layer that enforces source additivity, an upsampling technique and a context-aware prediction framework to reduce output artifacts. Experiments for singing voice separation indicate that our architecture yields a performance comparable to a state-of-the-art spectrogram-based U-Net architecture, given the same data. Finally, we reveal a problem with outliers in the currently used SDR evaluation metrics and suggest reporting rank-based statistics to alleviate this problem.

READ FULL TEXT
research
10/29/2018

End-to-end music source separation: is it possible in the waveform domain?

Most of the currently successful source separation techniques use the ma...
research
01/28/2020

Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform

We propose a time-domain audio source separation method using down-sampl...
research
09/12/2019

TF-Attention-Net: An End To End Neural Network For Singing Voice Separation

In terms of source separation task, most of deep neural networks have tw...
research
04/08/2019

Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets

Modern audio source separation techniques rely on optimizing sequence mo...
research
11/29/2019

J-Net: Randomly weighted U-Net for audio source separation

Several results in the computer vision literature have shown the potenti...
research
07/09/2021

Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients

We propose a method for the blind separation of sounds of musical instru...
research
03/04/2019

Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

State-of-the-art singing voice separation is based on deep learning maki...

Please sign up or login with your details

Forgot password? Click here to reset