On permutation invariant training for speech source separation

02/09/2021
by   Xiaoyu Liu, et al.
0

We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models. We extend two state-of-the-art PIT strategies. First, we look at the two-stage speaker separation and tracking algorithm based on frame level PIT (tPIT) and clustering, which was originally proposed for the STFT domain, and we adapt it to work with waveforms and over a learned latent space. Further, we propose an efficient clustering loss scalable to waveform models. Second, we extend a recently proposed auxiliary speaker-ID loss with a deep feature loss based on "problem agnostic speech features", to reduce the local permutation errors made by the utterance level PIT (uPIT). Our results show that the proposed extensions help reducing permutation ambiguity. However, we also note that the studied STFT-based models are more effective at reducing permutation errors than waveform-based models, a perspective overlooked in recent studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Adversarial Permutation Invariant Training for Universal Sound Separation

Universal sound separation consists of separating mixes with arbitrary s...
research
07/30/2021

Speeding Up Permutation Invariant Training for Source Separation

Permutation invariant training (PIT) is a widely used training criterion...
research
08/14/2017

Convolutive Audio Source Separation using Robust ICA and an intelligent evolving permutation ambiguity solution

Audio source separation is the task of isolating sound sources that are ...
research
10/08/2021

Location-based training for multi-channel talker-independent speaker separation

Permutation-invariant training (PIT) is a dominant approach for addressi...
research
07/23/2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training ...
research
07/01/2016

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

We propose a novel deep learning model, which supports permutation invar...

Please sign up or login with your details

Forgot password? Click here to reset