Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

07/23/2019
by   Cunhang Fan, et al.
0

Deep clustering (DC) and utterance-level permutation invariant training (uPIT) have been demonstrated promising for speaker-independent speech separation. DC is usually formulated as two-step processes: embedding learning and embedding clustering, which results in complex separation pipelines and a huge obstacle in directly optimizing the actual separation objectives. As for uPIT, it only minimizes the chosen permutation with the lowest mean square error, doesn't discriminate it with other permutations. In this paper, we propose a discriminative learning method for speaker-independent speech separation using deep embedding features. Firstly, a DC network is trained to extract deep embedding features, which contain each source's information and have an advantage in discriminating each target speakers. Then these features are used as the input for uPIT to directly separate the different sources. Finally, uPIT and DC are jointly trained, which directly optimizes the actual separation objectives. Moreover, in order to maximize the distance of each permutation, the discriminative learning is applied to fine tuning the whole model. Our experiments are conducted on WSJ0-2mix dataset. Experimental results show that the proposed models achieve better performances than DC and uPIT for speaker-independent speech separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
03/26/2021

Guided Training: A Simple Method for Single-channel Speaker Separation

Deep learning has shown a great potential for speech separation, especia...
research
04/06/2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Monaural speech dereverberation is a very challenging task because no sp...
research
07/01/2016

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

We propose a novel deep learning model, which supports permutation invar...
research
09/18/2020

X-DC: Explainable Deep Clustering based on Learnable Spectrogram Templates

Deep neural networks (DNNs) have achieved substantial predictive perform...
research
04/25/2019

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation

We address talker-independent monaural speaker separation from the persp...
research
02/05/2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

Multi-channel deep clustering (MDC) has acquired a good performance for ...
research
02/09/2021

On permutation invariant training for speech source separation

We study permutation invariant training (PIT), which targets at the perm...

Please sign up or login with your details

Forgot password? Click here to reset