iSEGAN: Improved Speech Enhancement Generative Adversarial Networks

02/20/2020
by   Deepak Baby, et al.
0

Popular neural network-based speech enhancement systems operate on the magnitude spectrogram and ignore the phase mismatch between the noisy and clean speech signals. Conditional generative adversarial networks (cGANs) show promise in addressing the phase mismatch problem by directly mapping the raw noisy speech waveform to the underlying clean speech signal. However, stabilizing and training cGAN systems is difficult and they still fall short of the performance achieved by the spectral enhancement approaches. This paper investigates whether different normalization strategies and one-sided label smoothing can further stabilize the cGAN-based speech enhancement model. In addition, we propose incorporating a Gammatone-based auditory filtering layer and a trainable pre-emphasis layer to further improve the performance of the cGAN framework. Simulation results show that the proposed approaches improve the speech enhancement performance of cGAN systems in addition to yielding improved stability and reduced computational effort.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2021

VSEGAN: Visual Speech Enhancement Generative Adversarial Network

Speech enhancement is an essential task of improving speech quality in n...
research
11/02/2022

Analysis of Noisy-target Training for DNN-based speech enhancement

Deep neural network (DNN)-based speech enhancement usually uses a clean ...
research
06/16/2021

A Flow-Based Neural Network for Time Domain Speech Enhancement

Speech enhancement involves the distinction of a target speech signal fr...
research
03/28/2017

SEGAN: Speech Enhancement Generative Adversarial Network

Current speech enhancement techniques operate on the spectral domain and...
research
12/18/2017

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

Speech enhancement deep learning systems usually require large amounts o...
research
03/24/2022

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Generative adversarial networks have recently demonstrated outstanding p...
research
07/01/2020

Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components

Power spectral density (PSD) estimates of various microphone signal comp...

Please sign up or login with your details

Forgot password? Click here to reset