NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer

12/05/2022
by   Changsheng Quan, et al.
0

This work proposes a multichannel narrow-band speech separation network. In the short-time Fourier transform (STFT) domain, the proposed network processes each frequency independently, and all frequencies use a shared network. For each frequency, the network performs end-to-end speech separation, namely taking as input the STFT coefficients of microphone signals, and predicting the separated STFT coefficients of multiple speakers. The proposed network learns to cluster the frame-wise spatial/steering vectors that belong to different speakers. It is mainly composed of three components. First, a self-attention network. Clustering of spatial vectors shares a similar principle with the self-attention mechanism in the sense of computing the similarity of vectors and then aggregating similar vectors. Second, a convolutional feed-forward network. The convolutional layers are employed for signal smoothing and reverberation processing. Third, a novel hidden-layer normalization method, i.e. group batch normalization (GBN), is especially designed for the proposed narrow-band network to maintain the distribution of hidden units over frequencies. Overall, the proposed network is named NBC2, as it is a revised version of our previous NBC (narrow-band conformer) network. Experiments show that 1) the proposed network outperforms other state-of-the-art methods by a large margin, 2) the proposed GBN improves the signal-to-distortion ratio by 3 dB, relative to other normalization methods, such as batch/layer/group normalization, 3) the proposed narrow-band network is spectrum-agnostic, as it does not learn spectral patterns, and 4) the proposed network is indeed performing frame clustering (demonstrated by the attention maps).

READ FULL TEXT

page 1

page 4

page 10

research
04/09/2022

Multichannel Speech Separation with Narrow-band Conformer

This work proposes a multichannel speech separation method with narrow-b...
research
07/31/2023

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

This work proposes a neural network to extensively exploit spatial infor...
research
10/12/2021

Multi-channel Narrow-Band Deep Speech Separation with Full-band Permutation Invariant Training

This paper addresses the problem of multi-channel multi-speech separatio...
research
11/16/2022

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

In multichannel speech enhancement, both spectral and spatial informatio...
research
09/08/2022

TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

We propose TF-GridNet, a novel multi-path deep neural network (DNN) oper...
research
04/30/2019

Estimating the Frequency of a Clustered Signal

We consider the problem of locating a signal whose frequencies are "off ...
research
11/22/2022

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

While current deep learning (DL)-based beamforming techniques have been ...

Please sign up or login with your details

Forgot password? Click here to reset