Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

09/06/2017
by   Daniel Michelsanti, et al.
0

Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image processing framework proposed by Isola et al. [1] to learn a mapping from the spectrogram of noisy speech to an enhanced counterpart. The SE cGAN consists of two networks, trained in an adversarial manner: a generator that tries to enhance the input noisy spectrogram, and a discriminator that tries to distinguish between enhanced spectrograms provided by the generator and clean ones from the database using the noisy spectrogram as a condition. We evaluate the performance of the cGAN method in terms of perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and equal error rate (EER) of speaker verification (an example application). Experimental results show that the cGAN method overall outperforms the classical short-time spectral amplitude minimum mean square error (STSA-MMSE) SE algorithm, and is comparable to a deep neural network-based SE approach (DNN-SE).

READ FULL TEXT
research
11/15/2017

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

We investigate the effectiveness of generative adversarial networks (GAN...
research
02/02/2018

Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

In this paper we propose a Deep Neural Network (DNN) based Speech Enhanc...
research
10/27/2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Although deep neural network (DNN)-based speech enhancement (SE) methods...
research
04/02/2020

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

The intelligibility of natural speech is seriously degraded when exposed...
research
12/16/2021

Towards Robust Real-time Audio-Visual Speech Enhancement

The human brain contextually exploits heterogeneous sensory information ...
research
06/16/2021

A Flow-Based Neural Network for Time Domain Speech Enhancement

Speech enhancement involves the distinction of a target speech signal fr...
research
10/29/2020

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

Speech enhancement at extremely low signal-to-noise ratio (SNR) conditio...

Please sign up or login with your details

Forgot password? Click here to reset