Implicit Filter-and-sum Network for Multi-channel Speech Separation

11/17/2020
by   Yi Luo, et al.
0

Various neural network architectures have been proposed in recent years for the task of multi-channel speech separation. Among them, the filter-and-sum network (FaSNet) performs end-to-end time-domain filter-and-sum beamforming and has shown effective in both ad-hoc and fixed microphone array geometries. In this paper, we investigate multiple ways to improve the performance of FaSNet. From the problem formulation perspective, we change the explicit time-domain filter-and-sum operation which involves all the microphones into an implicit filter-and-sum operation in the latent space of only the reference microphone. The filter-and-sum operation is applied on a context around the frame to be separated. This allows the problem formulation to better match the objective of end-to-end separation. From the feature extraction perspective, we modify the calculation of sample-level normalized cross correlation (NCC) features into feature-level NCC (fNCC) features. This makes the model better matches the implicit filter-and-sum formulation. Experiment results on both ad-hoc and fixed microphone array geometries show that the proposed modification to the FaSNet, which we refer to as iFaSNet, is able to significantly outperform the benchmark FaSNet across all conditions with an on par model complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2019

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

An important problem in ad-hoc microphone speech separation is how to gu...
research
12/01/2020

Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

Recently, the research on ad-hoc microphone arrays with deep learning ha...
research
03/03/2021

Continuous Speech Separation with Ad Hoc Microphone Arrays

Speech separation has been shown effective for multi-talker speech recog...
research
11/11/2020

On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

This paper introduces a new method for multi-channel time domain speech ...
research
02/26/2023

DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement

Invariance to microphone array configuration is a rare attribute in neur...
research
12/07/2021

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

Frequency-domain neural beamformers are the mainstream methods for recen...
research
05/03/2019

Convolution is outer product

The inner product operation between tensors is the corner stone of deep ...

Please sign up or login with your details

Forgot password? Click here to reset