Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

05/25/2023
by   Rawad Melhem, et al.
0

Speech separation is very important in real-world applications such as human-machine interaction, hearing aids devices, and automatic meeting transcription. In recent years, a significant improvement occurred towards the solution based on deep learning. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of real-world mixtures. The difficulty in building a realistic dataset led researchers to use unsupervised learning methods, because of their ability to handle realistic mixtures directly. The results of unsupervised learning methods are still unconvincing. In this paper, a method is introduced to create a realistic dataset with ground truth sources for speech separation. The main challenge in designing a realistic dataset is the unavailability of ground truths for speakers signals. To address this, we propose a method for simultaneously recording two speakers and obtaining the ground truth for each. We present a methodology for benchmarking our realistic dataset using a deep learning model based on Bidirectional Gated Recurrent Units (BGRU) and clustering algorithm. The experiments show that our proposed dataset improved SI-SDR (Scale Invariant Signal to Distortion Ratio) by 1.65 dB and PESQ (Perceptual Evaluation of Speech Quality) by approximately 0.5. We also evaluated the effectiveness of our method at different distances between the microphone and the speakers and found that it improved the stability of the learned model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Unsupervised Sound Separation Using Mixtures of Mixtures

In recent years, rapid progress has been made on the problem of single-c...
research
03/29/2018

Cracking the cocktail party problem by multi-beam deep attractor network

While recent progresses in neural network approaches to single-channel s...
research
04/23/2022

Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation

Recently, supervised speech separation has made great progress. However,...
research
10/20/2021

REAL-M: Towards Speech Separation on Real Mixtures

In recent years, deep learning based source separation has achieved impr...
research
10/22/2019

WHAMR!: Noisy and Reverberant Single-Channel Speech Separation

While significant advances have been made in recent years in the separat...
research
10/20/2021

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

The recently-proposed mixture invariant training (MixIT) is an unsupervi...
research
08/18/2015

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning ...

Please sign up or login with your details

Forgot password? Click here to reset