Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training

10/20/2021
by   Chenyang Gao, et al.
0

Single-channel speech separation is required for multi-speaker speech recognition. Recent deep learning-based approaches focused on time-domain audio separation net (TasNet) because it has superior performance and lower latency compared to the conventional time-frequency-based (T-F-based) approaches. Most of these works rely on the masking-based method that estimates a linear mapping function (mask) for each speaker. However, the other commonly used method, the mapping-based method that is less sensitive to SNR variations, is inadequately studied in the time domain. We explore the potential of the mapping-based method by introducing attention augmented DPRNN (AttnAugDPRNN) which directly approximates the clean sources from the mixture for speech separation. Permutation Invariant Training (PIT) has been a paradigm to solve the label ambiguity problem for speech separation but usually leads to suboptimal performance. To solve this problem, we propose an efficient training strategy called Hierarchical Constraint Training (HCT) to regularize the training, which could effectively improve the model performance. When using PIT, our results showed that mapping-based AttnAugDPRNN outperformed masking-based AttnAugDPRNN when the training corpus is large. Mapping-based AttnAugDPRNN with HCT significantly improved the SI-SDR by 10.1 AttnAugDPRNN without HCT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2021

Guided Training: A Simple Method for Single-channel Speaker Separation

Deep learning has shown a great potential for speech separation, especia...
research
12/18/2019

Ene-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
12/18/2019

End-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
02/07/2021

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

In this paper, we present a novel multi-channel speech extraction system...
research
12/17/2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem

Deep learning based models have significantly improved the performance o...
research
11/16/2021

Single-channel speech separation using Soft-minimum Permutation Invariant Training

The goal of speech separation is to extract multiple speech sources from...
research
08/04/2019

Probabilistic Permutation Invariant Training for Speech Separation

Single-microphone, speaker-independent speech separation is normally per...

Please sign up or login with your details

Forgot password? Click here to reset