End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

10/30/2019
by   Yi Luo, et al.
0

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones. The former requires the system to be invariant to different indexing of the microphones with the same locations, while the latter requires the system to be able to process inputs with varying dimensions. Conventional optimization-based beamforming techniques satisfy these requirements by definition, while for deep learning-based end-to-end systems those constraints are not fully addressed. In this paper, we propose transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation. Based on the filter-and-sum network (FaSNet), a recently proposed end-to-end time-domain beamforming system, we show how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays. Moreover, we show that TAC also significantly improves the separation performance with fixed geometry array configuration, further proving the effectiveness of the proposed paradigm in the general problem of multi-microphone speech separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Implicit Filter-and-sum Network for Multi-channel Speech Separation

Various neural network architectures have been proposed in recent years ...
research
12/01/2020

Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

Recently, the research on ad-hoc microphone arrays with deep learning ha...
research
03/29/2021

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Recently, speech recognition with ad-hoc microphone arrays has received ...
research
10/12/2021

Multi-channel Narrow-Band Deep Speech Separation with Full-band Permutation Invariant Training

This paper addresses the problem of multi-channel multi-speech separatio...
research
12/07/2021

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

Frequency-domain neural beamformers are the mainstream methods for recen...
research
10/16/2022

End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays

Conventional sound source localization methods are mostly based on a sin...
research
10/28/2021

Continuous Speech Separation with Recurrent Selective Attention Network

While permutation invariant training (PIT) based continuous speech separ...

Please sign up or login with your details

Forgot password? Click here to reset