MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

04/08/2021
by   Szu-Wei Fu, et al.
30

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).

READ FULL TEXT

page 1

page 2

page 3

page 4

05/06/2019

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...
03/23/2022

MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

Training of speech enhancement systems often does not incorporate knowle...
02/14/2020

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Improving subjective sound quality of enhanced signals is one of the mos...
05/06/2021

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...
09/12/2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Speech enhancement model is used to map a noisy speech to a clean speech...
10/02/2021

Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

The cleft lip and palate (CLP) speech intelligibility is distorted due t...
08/20/2017

An evaluation of intrusive instrumental intelligibility metrics

Instrumental intelligibility metrics are commonly used as an alternative...

Code Repositories

MetricGAN

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement (ICML 2019, with Travel awards)


view repo