A Recurrent Encoder-Decoder Approach with Skip-filtering Connections for Monaural Singing Voice Separation

The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral representations are then used to derive time-frequency masks. In this work we introduce a method to directly learn time-frequency masks from an observed mixture magnitude spectrum. We employ recurrent neural networks and train them using prior knowledge only for the magnitude spectrum of the target source. To assess the performance of the proposed method, we focus on the task of singing voice separation. The results from an objective evaluation show that our proposed method provides comparable results to deep learning based methods which operate over complicated signal representations. Compared to previous methods that approximate time-frequency masks, our method has increased performance of signal to distortion ratio by an average of 3.8 dB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2017

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Singing voice separation based on deep learning relies on the usage of t...
research
04/12/2019

Examining the Mapping Functions of Denoising Autoencoders in Music Source Separation

The goal of this work is to investigate what music source separation app...
research
11/01/2017

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Robust speech processing in multi-talker environments requires effective...
research
05/06/2019

Investigating kernel shapes and skip connections for deep learning-based harmonic-percussive separation

In this paper we propose an efficient deep learning encoder-decoder netw...
research
03/03/2020

Unsupervised Interpretable Representation Learning for Singing Voice Separation

In this work, we present a method for learning interpretable music signa...
research
12/04/2018

Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy

Separating a singing voice from its music accompaniment remains an impor...
research
09/15/2023

Music Source Separation Based on a Lightweight Deep Learning Framework (DTTNET: DUAL-PATH TFC-TDF UNET)

Music source separation (MSS) aims to extract 'vocals', 'drums', 'bass' ...

Please sign up or login with your details

Forgot password? Click here to reset