Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

by   Haohe Liu, et al.

This paper presents a new input format, channel-wise subband input (CWS), for convolutional neural networks (CNN) based music source separation (MSS) models in the frequency domain. We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands. Specifically, in this paper, we decompose the input mixture spectra into several bands and concatenate them channel-wise as the model input. The proposed approach enables effective weight sharing in each subband and introduces more flexibility between channels. For comparison purposes, we perform voice and accompaniment separation (VAS) on models with different scales, architectures, and CWS settings. Experiments show that the CWS input is beneficial in many aspects. We evaluate our method on musdb18hq test set, focusing on SDR, SIR and SAR metrics. Among all our experiments, CWS enables models to obtain 6.9 even a smaller number of parameters, less training data, and shorter training time, our MDenseNet with 8-bands CWS input still surpasses the original MMDenseNet with a large margin. Moreover, CWS also reduces computational cost and training time to a large extent.


CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet

Music source separation (MSS) shows active progress with deep learning m...

Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

Deep neural networks with convolutional layers usually process the entir...

Spectrogram-channels u-net: a source separation model viewing each channel as the spectrogram of each source

Nowadays, the task of sound source separation is an interesting task for...

Examining the Mapping Functions of Denoising Autoencoders in Music Source Separation

The goal of this work is to investigate what music source separation app...

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

We present a unified network for voice separation of an unknown number o...

Music Source Separation with Deep Equilibrium Models

While deep neural network-based music source separation (MSS) is very ef...

Please sign up or login with your details

Forgot password? Click here to reset