Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

01/02/2020
by   Rongzhi Gu, et al.
0

Target speech separation refers to extracting the target speaker's speech from mixed signals. Despite the recent advances in deep learning based close-talk speech separation, the applications to real-world are still an open issue. Two main challenges are the complex acoustic environment and the real-time processing requirement. To address these challenges, we propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture in reverberant environments, assisted with directional information of the speaker(s). Firstly, against variations brought by complex environment, the key idea is to increase the acoustic representation completeness through the jointly modeling of temporal, spectral and spatial discriminability between the target and interference source. Specifically, temporal, spectral, spatial along with the designed directional features are integrated to create a joint acoustic representation. Secondly, to reduce the latency, we design a fully-convolutional autoencoder framework, which is purely end-to-end and single-pass. All the feature computation is implemented by the network layers and operations to speed up the separation procedure. Evaluation is conducted on simulated reverberant dataset WSJ0-2mix and WSJ0-3mix under speaker-independent scenario. Experimental results demonstrate that the proposed method outperforms state-of-the-art deep learning based multi-channel approaches with fewer parameters and faster processing speed. Furthermore, the proposed temporal-spatial neural filter can handle mixtures with varying and unknown number of speakers and exhibits persistent performance even when existing a direction estimation error. Codes and models will be released soon.

READ FULL TEXT

page 1

page 5

research
03/09/2020

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD...
research
04/26/2021

Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain

To date, mainstream target speech separation (TSS) approaches are formul...
research
08/15/2022

LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation

The traditional adaptive algorithms will face the non-uniqueness problem...
research
10/27/2022

Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

In conventional multichannel audio signal enhancement, spatial and spect...
research
11/04/2022

Spatially Selective Deep Non-linear Filters for Speaker Extraction

In a scenario with multiple persons talking simultaneously, the spatial ...
research
12/31/2020

Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective

Deep learning (DL) has brought about remarkable breakthrough in processi...
research
04/24/2023

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters

In a multi-channel separation task with multiple speakers, we aim to rec...

Please sign up or login with your details

Forgot password? Click here to reset