Log In Sign Up

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

by   Xiang Hao, et al.

Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of it. We evaluate the performance of the UNetGAN at low SNR conditions (up to -20dB) on the public benchmark. The result demonstrates that it significantly improves the speech quality and substantially outperforms the representative deep learning models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive spectrum approximation cost function (PSA-BLSTM) and Wave-U-Net regarding Short-Time Objective Intelligibility (STOI) and Perceptual evaluation of speech quality (PESQ).


SNR-based teachers-student technique for speech enhancement

It is very challenging for speech enhancement methods to achieves robust...

Time-domain Speech Enhancement with Generative Adversarial Learning

Speech enhancement aims to obtain speech signals with high intelligibili...

PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

PercepNet, a recent extension of the RNNoise, an efficient, high-quality...

A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement

Deep learning technology has been widely applied to speech enhancement. ...

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Generative adversarial network (GAN) still exists some problems in deali...

DDS: A new device-degraded speech dataset for speech enhancement

A large and growing amount of speech content in real-life scenarios is b...

A scalable noisy speech dataset and online subjective test framework

Background noise is a major source of quality impairments in Voice over ...