Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

10/20/2021
by   Aswin Sivaraman, et al.
0

The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlapping reverberant and noisy speech data from the AMI Corpus. The models are tested on real AMI recordings containing overlapping speech, and are evaluated subjectively by human listeners. To objectively evaluate our models, we also devise a synthetic AMI test set. For human evaluations on real recordings, we also propose a modification of the standard MUSHRA protocol to handle imperfect reference signals, which we call MUSHIRA. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest SI-SNR improvement, PESQ scores, and human listening ratings across synthetic and real datasets, outperforming unadapted generalist models trained on orders of magnitude more data. Our results show that unsupervised learning through MixIT enables model adaptation on real-world unlabeled spontaneous speech recordings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

Unsupervised Multi-channel Separation and Adaptation

A key challenge in machine learning is to generalize from training data ...
research
06/23/2020

Unsupervised Sound Separation Using Mixtures of Mixtures

In recent years, rapid progress has been made on the problem of single-c...
research
04/23/2022

Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Recently, supervised speech separation has made great progress. However,...
research
10/22/2019

WHAMR!: Noisy and Reverberant Single-Channel Speech Separation

While significant advances have been made in recent years in the separat...
research
11/15/2022

Reverberation as Supervision for Speech Separation

This paper proposes reverberation as supervision (RAS), a novel unsuperv...
research
05/25/2023

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Speech separation is very important in real-world applications such as h...
research
06/01/2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

Supervised neural network training has led to significant progress on si...

Please sign up or login with your details

Forgot password? Click here to reset