Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

03/14/2023
by   Wang Dai, et al.
0

This work proposes a learnable filterbank based on a multi-channel masking framework for multi-channel source separation. The learnable filterbank is a 1D Conv layer, which transforms the raw waveform into a 2D representation. In contrast to the conventional single-channel masking method, we estimate a mask for each individual microphone channel. The estimated masks are then applied to the transformed waveform representation like in the traditional filter-and-sum beamforming operation. Specifically, each mask is used to multiply the corresponding channel's 2D representation, and the masked output of all channels are then summed. At last, a 1D transposed Conv layer is used to convert the summed masked signal into the waveform domain. The experimental results show our method outperforms single-channel masking with a learnable filterbank and can outperform multi-channel complex masking with STFT complex spectrum in the STGCSEN model if a learnable filterbank is transformed to a higher feature dimension. The spatial response analysis also verifies that multi-channel masking in the learnable filterbank domain has spatial selectivity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

End-to-End Multi-Channel Speech Separation

The end-to-end approach for single-channel speech separation has been st...
research
10/08/2021

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

In recent years, many deep learning techniques for single-channel sound ...
research
06/30/2022

Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

We present a single-stage casual waveform-to-waveform multichannel model...
research
06/15/2022

On the Use of Deep Mask Estimation Module for Neural Source Separation Systems

Most of the recent neural source separation systems rely on a masking-ba...
research
09/11/2023

Addressing Feature Imbalance in Sound Source Separation

Neural networks often suffer from a feature preference problem, where th...
research
11/02/2017

Does Phase Matter For Monaural Source Separation?

The "cocktail party" problem of fully separating multiple sources from a...
research
02/05/2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

Multi-channel deep clustering (MDC) has acquired a good performance for ...

Please sign up or login with your details

Forgot password? Click here to reset