Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

10/10/2021
by   Zengwei Yao, et al.
0

The crux of single-channel speech separation is how to encode the mixture of signals into such a latent embedding space that the signals from different speakers can be precisely separated. Existing methods for speech separation either transform the speech signals into frequency domain to perform separation or seek to learn a separable embedding space by constructing a latent domain based on convolutional filters. While the latter type of methods learning an embedding space achieves substantial improvement for speech separation, we argue that the embedding space defined by only one latent domain does not suffice to provide a thoroughly separable encoding space for speech separation. In this paper, we propose the Stepwise-Refining Speech Separation Network (SRSSN), which follows a coarse-to-fine separation framework. It first learns a 1-order latent domain to define an encoding space and thereby performs a rough separation in the coarse phase. Then the proposed SRSSN learns a new latent domain along each basis function of the existing latent domain to obtain a high-order latent domain in the refining phase, which enables our model to perform a refining separation to achieve a more precise speech separation. We demonstrate the effectiveness of our SRSSN by conducting extensive experiments, including speech separation in a clean (noise-free) setting on WSJ0-2/3mix datasets as well as in noisy/reverberant settings on WHAM!/WHAMR! datasets. Furthermore, we also perform experiments of speech recognition on separated speech signals by our model to evaluate the performance of speech separation indirectly.

READ FULL TEXT

page 1

page 8

page 9

page 10

page 11

research
07/12/2017

Speaker-independent Speech Separation with Deep Attractor Network

Despite the recent success of deep learning for many speech processing t...
research
12/18/2019

Ene-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
12/18/2019

End-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
09/15/2023

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Many real-life applications of automatic speech recognition (ASR) requir...
research
04/09/2022

Multichannel Speech Separation with Narrow-band Conformer

This work proposes a multichannel speech separation method with narrow-b...
research
07/24/2018

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Speaker-aware source separation methods are promising workarounds for ma...
research
11/22/2022

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

While current deep learning (DL)-based beamforming techniques have been ...

Please sign up or login with your details

Forgot password? Click here to reset