Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain

04/26/2021
by   Rongzhi Gu, et al.
0

To date, mainstream target speech separation (TSS) approaches are formulated to estimate the complex ratio mask (cRM) of the target speech in time-frequency domain under supervised deep learning framework. However, the existing deep models for estimating cRM are designed in the way that the real and imaginary parts of the cRM are separately modeled using real-valued training data pairs. The research motivation of this study is to design a deep model that fully exploits the temporal-spectral-spatial information of multi-channel signals for estimating cRM directly and efficiently in complex domain. As a result, a novel TSS network is designed consisting of two modules, a complex neural spatial filter (cNSF) and an MVDR. Essentially, cNSF is a cRM estimation model and an MVDR module is cascaded to the cNSF module to reduce the nonlinear speech distortions introduced by neural network. Specifically, to fit the cRM target, all input features of cNSF are reformulated into complex-valued representations following the supervised learning paradigm. Then, to achieve good hierarchical feature abstraction, a complex deep neural network (cDNN) is delicately designed with U-Net structure. Experiments conducted on simulated multi-channel speech data demonstrate the proposed cNSF outperforms the baseline NSF by 12.1 scale-invariant signal-to-distortion ratio and 33.1

READ FULL TEXT
research
01/02/2020

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

Target speech separation refers to extracting the target speaker's speec...
research
09/04/2017

Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

Supervised speech separation uses supervised learning algorithms to lear...
research
12/07/2021

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

Frequency-domain neural beamformers are the mainstream methods for recen...
research
04/14/2022

Atmospheric Turbulence Removal with Complex-Valued Convolutional Neural Network

Atmospheric turbulence distorts visual imagery and is always problematic...
research
09/08/2022

TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

We propose TF-GridNet, a novel multi-path deep neural network (DNN) oper...
research
02/07/2022

Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks

Impulse response estimation in high noise and in-the-wild settings, with...
research
11/01/2016

Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection

In this letter, we propose enhanced factored three way restricted Boltzm...

Please sign up or login with your details

Forgot password? Click here to reset