Single-Microphone Speech Enhancement and Separation Using Deep Learning

08/31/2018
by   Morten Kolbæk, et al.
0

The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of the generalizability capability of modern deep learning-based single-microphone speech enhancement algorithms. We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data. Furthermore, we propose uPIT, a deep learning-based algorithm for single-microphone speech separation and we report state-of-the-art results on a speaker-independent multi-talker speech separation task. Additionally, we show that uPIT works well for joint speech separation and enhancement without explicit prior knowledge about the noise type or number of speakers. Finally, we show that deep learning-based speech enhancement algorithms designed to minimize the classical short-time spectral amplitude mean squared error leads to enhanced speech signals which are essentially optimal in terms of STOI, a state-of-the-art speech intelligibility estimator.

READ FULL TEXT
research
03/27/2021

On TasNet for Low-Latency Single-Speaker Speech Enhancement

In recent years, speech processing algorithms have seen tremendous progr...
research
09/24/2017

A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement

Despite noise suppression being a mature area in signal processing, it r...
research
08/21/2020

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Speech enhancement and speech separation are two related tasks, whose pu...
research
07/01/2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics

While deep learning based speech enhancement systems have made rapid pro...
research
03/14/2022

MDNet: Learning Monaural Speech Enhancement from Deep Prior Gradient

While traditional statistical signal processing model-based methods can ...
research
11/25/2022

Stereo Speech Enhancement Using Custom Mid-Side Signals and Monaural Processing

Speech Enhancement (SE) systems typically operate on monaural input and ...
research
05/25/2021

RNNoise-Ex: Hybrid Speech Enhancement System based on RNN and Spectral Features

Recent interest in exploiting Deep Learning techniques for Noise Suppres...

Please sign up or login with your details

Forgot password? Click here to reset