Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

07/01/2016
by   Dong Yu, et al.
0

We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from most of the prior arts that treat speech separation as a multi-class regression problem and the deep clustering technique that considers it a segmentation (or clustering) problem, our model optimizes for the separation regression error, ignoring the order of mixing sources. This strategy cleverly solves the long-lasting label permutation problem that has prevented progress on deep learning based techniques for speech separation. Experiments on the equal-energy mixing setup of a Danish corpus confirms the effectiveness of PIT. We believe improvements built upon PIT can eventually solve the cocktail-party problem and enable real-world adoption of, e.g., automatic meeting transcription and multi-party human-computer interaction, where overlapping speech is common.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2021

Guided Training: A Simple Method for Single-channel Speaker Separation

Deep learning has shown a great potential for speech separation, especia...
research
07/23/2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training ...
research
08/04/2019

Probabilistic Permutation Invariant Training for Speech Separation

Single-microphone, speaker-independent speech separation is normally per...
research
10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...
research
12/19/2019

Practical applicability of deep neural networks for overlapping speaker separation

This paper examines the applicability in realistic scenarios of two deep...
research
12/17/2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem

Deep learning based models have significantly improved the performance o...
research
02/09/2021

On permutation invariant training for speech source separation

We study permutation invariant training (PIT), which targets at the perm...

Please sign up or login with your details

Forgot password? Click here to reset