Supervised Speech Separation Based on Deep Learning: An Overview

08/24/2017
by   DeLiang Wang, et al.
0

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

READ FULL TEXT

page 4

page 5

research
08/21/2020

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Speech enhancement and speech separation are two related tasks, whose pu...
research
09/04/2017

Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

Supervised speech separation uses supervised learning algorithms to lear...
research
05/20/2020

SADDEL: Joint Speech Separation and Denoising Model based on Multitask Learning

Speech data collected in real-world scenarios often encounters two issue...
research
07/15/2022

PodcastMix: A dataset for separating music and speech in podcasts

We introduce PodcastMix, a dataset formalizing the task of separating ba...
research
04/14/2020

Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment

In daily listening environments, speech is always distorted by backgroun...
research
05/18/2023

Speech Separation based on Contrastive Learning and Deep Modularization

The current monaural state of the art tools for speech separation relies...
research
12/19/2019

Practical applicability of deep neural networks for overlapping speaker separation

This paper examines the applicability in realistic scenarios of two deep...

Please sign up or login with your details

Forgot password? Click here to reset