TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

11/22/2022
by   Zhong-Qiu Wang, et al.
0

We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several multi-path blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard dataset for two-speaker separation. To show its robustness to noise and reverberation, we evaluate it on monaural reverberant speaker separation using the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!, and obtain state-of-the-art performance on both datasets. We then extend TF-GridNet to multi-microphone conditions through multi-microphone complex spectral mapping, and integrate it into a two-DNN system with a beamformer in between (named as MISO-BF-MISO in earlier studies), where the beamformer proposed in this paper is a novel multi-frame Wiener filter computed based on the outputs of the first DNN. State-of-the-art performance is obtained on the multi-channel tasks of SMS-WSJ and WHAMR!. Besides speaker separation, we apply the proposed algorithms to speech dereverberation and noisy-reverberant speech enhancement. State-of-the-art performance is obtained on a dereverberation dataset and on the dataset of the recent L3DAS22 multi-channel speech enhancement challenge.

READ FULL TEXT
research
09/08/2022

TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

We propose TF-GridNet, a novel multi-path deep neural network (DNN) oper...
research
10/29/2020

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

This paper proposes a full-band and sub-band fusion model, named as Full...
research
08/16/2021

Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation

A promising approach for speech dereverberation is based on supervised l...
research
08/16/2021

Convolutive Prediction for Reverberant Speech Separation

We investigate the effectiveness of convolutive prediction, a novel form...
research
10/04/2020

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speaker Separation

We propose multi-microphone complex spectral mapping, a simple way of ap...
research
02/15/2023

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

This paper describes our submission to the Second Clarity Enhancement Ch...
research
10/07/2021

Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

This paper introduces the Multi-Band Excited WaveNet a neural vocoder fo...

Please sign up or login with your details

Forgot password? Click here to reset