Unsupervised Sound Separation Using Mixtures of Mixtures

06/23/2020
by   Scott Wisdom, et al.
0

In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, the model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources. The reliance on this synthetic training data is problematic because good performance depends upon the degree of match between the training data and real-world audio, especially in terms of the acoustic conditions and distribution of sources. The acoustic properties can be challenging to accurately simulate, and the distribution of sound types may be hard to replicate. In this paper, we propose a completely unsupervised method, mixture invariant training (MixIT), that requires only single-channel acoustic mixtures. In MixIT, training examples are constructed by mixing together existing mixtures, and the model separates them into a variable number of latent sources, such that the separated sources can be remixed to approximate the original mixtures. We show that MixIT can achieve competitive performance compared to supervised methods on speech separation. Using MixIT in a semi-supervised learning setting enables unsupervised domain adaptation and learning from large amounts of real-world data without ground-truth source waveforms. In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures, train a speech enhancement system from noisy mixtures, and improve universal sound separation by incorporating a large amount of in-the-wild data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

Reverberation as Supervision for Speech Separation

This paper proposes reverberation as supervision (RAS), a novel unsuperv...
research
04/23/2022

Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Recently, supervised speech separation has made great progress. However,...
research
05/25/2023

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Speech separation is very important in real-world applications such as h...
research
06/01/2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

Supervised neural network training has led to significant progress on si...
research
10/20/2021

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

The recently-proposed mixture invariant training (MixIT) is an unsupervi...
research
01/16/2022

Modeling the Repetition-based Recovering of Acoustic and Visual Sources with Dendritic Neurons

In natural auditory environments, acoustic signals originate from the te...
research
02/08/2022

Unsupervised Source Separation via Self-Supervised Training

We introduce two novel unsupervised (blind) source separation methods, w...

Please sign up or login with your details

Forgot password? Click here to reset