Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

11/06/2021
by   Ziyi Xu, et al.
0

Speech enhancement employing deep neural networks (DNNs) for denoising are called deep noise suppression (DNS). During training, DNS methods are typically trained with mean squared error (MSE) type loss functions, which do not guarantee good perceptual quality. Perceptual evaluation of speech quality (PESQ) is a widely used metric for evaluating speech quality. However, the original PESQ algorithm is non-differentiable, and therefore cannot directly be used as optimization criterion for gradient-based learning. In this work, we propose an end-to-end non-intrusive PESQNet DNN to estimate the PESQ scores of the enhanced speech signal. Thus, by providing a reference-free perceptual loss, it serves as a mediator towards the DNS training, allowing to maximize the PESQ score of the enhanced speech signal. We illustrate the potential of our proposed PESQNet-mediated training on the basis of an already strong baseline DNS. As further novelty, we propose to train the DNS and the PESQNet alternatingly to keep the PESQNet up-to-date and perform well specifically for the DNS under training. Our proposed method is compared to the same DNS trained with MSE-based loss for joint denoising and dereverberation, and the Interspeech 2021 DNS Challenge baseline. Detailed analysis shows that the PESQNet mediation can further increase the DNS performance by about 0.1 PESQ points on synthetic test data and by 0.03 DNSMOS points on real test data, compared to training with the MSE-based loss. Our proposed method also outperforms the Challenge baseline by 0.2 PESQ points on synthetic test data and 0.1 DNSMOS points on real test data.

READ FULL TEXT

page 3

page 6

page 7

page 9

page 10

page 11

page 12

page 13

research
09/05/2023

Employing Real Training Data for Deep Noise Suppression

Most deep noise suppression (DNS) models are trained with reference-base...
research
05/06/2019

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

Utilizing a human-perception-related objective function to train a speec...
research
05/04/2022

Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

Perceptual evaluation of speech quality (PESQ) requires a clean speech r...
research
01/29/2018

On Psychoacoustically Weighted Cost Functions Towards Resource-Efficient Deep Neural Networks for Speech Denoising

We present a psychoacoustically enhanced cost function to balance networ...
research
03/31/2021

Y^2-Net FCRN for Acoustic Echo and Noise Suppression

In recent years, deep neural networks (DNNs) were studied as an alternat...
research
08/14/2019

Components Loss for Neural Networks in Mask-Based Speech Enhancement

Estimating time-frequency domain masks for single-channel speech enhance...
research
10/23/2019

End-to-End Multi-Task Denoising for the Joint Optimization of Perceptual Speech Metrics

Although supervised learning based on a deep neural network has recently...

Please sign up or login with your details

Forgot password? Click here to reset