MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

by   Szu-Wei Fu, et al.

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).


page 1

page 2

page 3

page 4


Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...

MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

Training of speech enhancement systems often does not incorporate knowle...

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Improving subjective sound quality of enhanced signals is one of the mos...

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Speech enhancement model is used to map a noisy speech to a clean speech...

Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

The cleft lip and palate (CLP) speech intelligibility is distorted due t...

An evaluation of intrusive instrumental intelligibility metrics

Instrumental intelligibility metrics are commonly used as an alternative...

Code Repositories


MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement (ICML 2019, with Travel awards)

view repo