MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

03/23/2022
by   George Close, et al.
0

Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results. Incorporating psychoacoustically motivated speech perception metrics as part of model training via a predictor network has recently gained interest. However, the performance of such predictors is limited by the distribution of metric scores that appear in the training data. In this work, we propose MetricGAN+/- (an extension of MetricGAN+, one such metric-motivated system) which introduces an additional network - a "de-generator" which attempts to improve the robustness of the prediction network (and by extension of the generator) by ensuring observation of a wider range of metric scores in training. Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8 generalisation to unseen noise and speech.

READ FULL TEXT
research
04/08/2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

The discrepancy between the cost function used for training a speech enh...
research
05/06/2019

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...
research
12/18/2017

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

Speech enhancement deep learning systems usually require large amounts o...
research
05/13/2019

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

Adversarial loss in a conditional generative adversarial network (GAN) i...
research
10/26/2018

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

We address the problem of speech enhancement generalisation to unseen en...
research
11/05/2019

Speech Enhancement via Deep Spectrum Image Translation Network

Quality and intelligibility of speech signals are degraded under additiv...

Please sign up or login with your details

Forgot password? Click here to reset