Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm

10/22/2020
by   Hideyuki Tachibana, et al.
0

In neural network-based monaural speech separation techniques, it has been recently common to evaluate the loss using the permutation invariant training (PIT) loss. However, the ordinary PIT requires to try all N! permutations between N ground truths and N estimates. Since the factorial complexity explodes very rapidly as N increases, a PIT-based training works only when the number of source signals is small, such as N = 2 or 3. To overcome this limitation, this paper proposes a SinkPIT, a novel variant of the PIT losses, which is much more efficient than the ordinary PIT loss when N is large. The SinkPIT is based on Sinkhorn's matrix balancing algorithm, which efficiently finds a doubly stochastic matrix which approximates the best permutation in a differentiable manner. The author conducted an experiment to train a neural network model to decompose a single-channel mixture into 10 sources using the SinkPIT, and obtained promising results.

READ FULL TEXT

page 2

page 4

research
10/21/2022

Adversarial Permutation Invariant Training for Universal Sound Separation

Universal sound separation consists of separating mixes with arbitrary s...
research
07/30/2021

Speeding Up Permutation Invariant Training for Source Separation

Permutation invariant training (PIT) is a widely used training criterion...
research
08/14/2017

Convolutive Audio Source Separation using Robust ICA and an intelligent evolving permutation ambiguity solution

Audio source separation is the task of isolating sound sources that are ...
research
02/08/2022

Unsupervised Source Separation via Self-Supervised Training

We introduce two novel unsupervised (blind) source separation methods, w...
research
06/01/2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

Supervised neural network training has led to significant progress on si...
research
02/09/2021

On permutation invariant training for speech source separation

We study permutation invariant training (PIT), which targets at the perm...
research
08/14/2019

Interleaved Multitask Learning for Audio Source Separation with Independent Databases

Deep Neural Network-based source separation methods usually train indepe...

Please sign up or login with your details

Forgot password? Click here to reset