Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss

03/27/2020
by   Yi Luo, et al.
0

Many recent source separation systems are designed to separate a fixed number of sources out of a mixture. In the cases where the source activation patterns are unknown, such systems have to either adjust the number of outputs or to identify invalid outputs from the valid ones. Iterative separation methods have gain much attention in the community as they can flexibly decide the number of outputs, however (1) they typically rely on long-term information to determine the stopping time for the iterations, which makes them hard to operate in a causal setting; (2) they lack a "fault tolerance" mechanism when the estimated number of sources is different from the actual number. In this paper, we propose a simple training method, the auxiliary autoencoding permutation invariant training (A2PIT), to alleviate the two issues. A2PIT assumes a fixed number of outputs and uses auxiliary autoencoding losses to force the invalid outputs to be the copies of the input mixture, and detects invalid outputs in a fully unsupervised way during inference phase. Experiment results show that A2PIT is able to improve the separation performance across various numbers of speakers and effectively detect the number of speakers in a mixture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

Supervised neural network training has led to significant progress on si...
research
03/28/2022

Improving Source Separation by Explicitly Modeling Dependencies Between Sources

We propose a new method for training a supervised source separation syst...
research
03/30/2022

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

The vast majority of speech separation methods assume that the number of...
research
07/30/2021

Speeding Up Permutation Invariant Training for Source Separation

Permutation invariant training (PIT) is a widely used training criterion...
research
10/12/2020

The Cone of Silence: Speech Separation by Localization

Given a multi-microphone recording of an unknown number of speakers talk...
research
11/13/2012

Segregating event streams and noise with a Markov renewal process model

We describe an inference task in which a set of timestamped event observ...
research
11/09/2020

Guided Source Separation

State-of-the-art separation of desired signal components from a mixture ...

Please sign up or login with your details

Forgot password? Click here to reset