Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

06/01/2021
by   Scott Wisdom, et al.
0

Supervised neural network training has led to significant progress on single-channel sound separation. This approach relies on ground truth isolated sources, which precludes scaling to widely available mixture data and limits progress on open-domain tasks. The recent mixture invariant training (MixIT) method enables training on in-the wild data; however, it suffers from two outstanding problems. First, it produces models which tend to over-separate, producing more output sources than are present in the input. Second, the exponential computational complexity of the MixIT loss limits the number of feasible output sources. These problems interact: increasing the number of output sources exacerbates over-separation. In this paper we address both issues. To combat over-separation we introduce new losses: sparsity losses that favor fewer output sources and a covariance loss that discourages correlated outputs. We also experiment with a semantic classification loss by predicting weak class labels for each mixture. To extend MixIT to larger numbers of sources, we introduce an efficient approximation using a fast least-squares solution, projected onto the MixIT constraint set. Our experiments show that the proposed losses curtail over-separation and improve overall performance. The best performance is achieved using larger numbers of output sources, enabled by our efficient MixIT loss, combined with sparsity losses to prevent over-separation. On the FUSS test set, we achieve over 13 dB in multi-source SI-SNR improvement, while boosting single-source reconstruction SI-SNR by over 17 dB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Unsupervised Sound Separation Using Mixtures of Mixtures

In recent years, rapid progress has been made on the problem of single-c...
research
03/27/2020

Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss

Many recent source separation systems are designed to separate a fixed n...
research
11/02/2020

What's All the FUSS About Free Universal Sound Separation Data?

We introduce the Free Universal Sound Separation (FUSS) dataset, a new c...
research
11/15/2022

Reverberation as Supervision for Speech Separation

This paper proposes reverberation as supervision (RAS), a novel unsuperv...
research
10/20/2021

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

The recently-proposed mixture invariant training (MixIT) is an unsupervi...
research
02/08/2021

Speaker and Direction Inferred Dual-channel Speech Separation

Most speech separation methods, trying to separate all channel sources s...

Please sign up or login with your details

Forgot password? Click here to reset