Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

05/04/2022
by   Ziyi Xu, et al.
0

Perceptual evaluation of speech quality (PESQ) requires a clean speech reference as input, but predicts the results from (reference-free) absolute category rating (ACR) tests. In this work, we train a fully convolutional recurrent neural network (FCRN) as deep noise suppression (DNS) model, with either a non-intrusive or an intrusive PESQNet, where only the latter has access to a clean speech reference. The PESQNet is used as a mediator providing a perceptual loss during the DNS training to maximize the PESQ score of the enhanced speech signal. For the intrusive PESQNet, we investigate two topologies, called early-fusion (EF) and middle-fusion (MF) PESQNet, and compare to the non-intrusive PESQNet to evaluate and to quantify the benefits of employing a clean speech reference input during DNS training. Detailed analyses show that the DNS trained with the MF-intrusive PESQNet outperforms the Interspeech 2021 DNS Challenge baseline and the same DNS trained with an MSE loss by 0.23 and 0.12 PESQ points, respectively. Furthermore, we can show that only marginal benefits are obtained compared to the DNS trained with the non-intrusive PESQNet. Therefore, as ACR listening tests, the PESQNet does not necessarily require a clean speech reference input, opening the possibility of using real data for DNS training.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
09/05/2023

Employing Real Training Data for Deep Noise Suppression

Most deep noise suppression (DNS) models are trained with reference-base...
research
11/06/2021

Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

Speech enhancement employing deep neural networks (DNNs) for denoising a...
research
04/18/2023

Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN

Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech...
research
08/16/2018

Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM

Nowadays, most of the objective speech quality assessment tools (e.g., p...
research
05/03/2021

Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

In this paper, we present a full-reference speech quality prediction mod...
research
11/09/2020

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

The calculation of most objective speech intelligibility assessment metr...
research
11/24/2021

Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations

Non-intrusive speech intelligibility (SI) prediction from binaural signa...

Please sign up or login with your details

Forgot password? Click here to reset