Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets

Modern audio source separation techniques rely on optimizing sequence model architectures such as, 1D-CNNs, on mixture recordings to generalize well to unseen mixtures. Specifically, recent focus is on time-domain based architectures such as Wave-U-Net which exploit temporal context by extracting multi-scale features. However, the optimality of the feature extraction process in these architectures has not been well investigated. In this paper, we examine and recommend critical architectural changes that forge an optimal multi-scale feature extraction process. To this end, we replace regular 1-D convolutions with adaptive dilated convolutions that have innate capability of capturing increased context by using large temporal receptive fields. We also investigate the impact of dense connections on the extraction process that encourage feature reuse and better gradient flow. The dense connections between the downsampling and upsampling paths of a U-Net architecture capture multi-resolution information leading to improved temporal modelling. We evaluate the proposed approaches on the MUSDB test dataset. In addition to providing an improved performance over the state-of-the-art, we also provide insights on the impact of different architectural choices on complex data-driven solutions for source separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2018

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Models for audio source separation usually operate on the magnitude spec...
research
06/29/2017

Multi-scale Multi-band DenseNets for Audio Source Separation

This paper deals with the problem of audio source separation. To handle ...
research
07/14/2020

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

In this paper, we present an efficient neural network for end-to-end gen...
research
11/11/2020

On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

This paper introduces a new method for multi-channel time domain speech ...
research
03/11/2023

On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals

We study the single-channel source separation problem involving orthogon...
research
09/05/2023

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

Cinematic audio source separation is a relatively new subtask of audio s...
research
05/25/2023

Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders

Unsupervised source separation involves unraveling an unknown set of sou...

Please sign up or login with your details

Forgot password? Click here to reset