REAL-M: Towards Speech Separation on Real Mixtures

10/20/2021
by   Cem Subakan, et al.
8

In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures. The performance predictions of the SI-SNR estimator indeed correlate well with human opinions. Moreover, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow those achieved on synthetic benchmarks when evaluating popular speech separation models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2021

Monaural source separation: From anechoic to reverberant environments

Impressive progress in neural network-based single-channel speech source...
research
11/05/2018

Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information

We present a monophonic source separation system that is trained by only...
research
05/16/2005

Separating a Real-Life Nonlinear Image Mixture

When acquiring an image of a paper document, the image printed on the ba...
research
05/25/2023

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Speech separation is very important in real-world applications such as h...
research
04/23/2022

Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Recently, supervised speech separation has made great progress. However,...
research
06/28/2021

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

Target speech separation is the process of filtering a certain speaker's...

Please sign up or login with your details

Forgot password? Click here to reset