Predicting score distribution to improve non-intrusive speech quality estimation

04/13/2022
by   Abu Zaher Md Faridee, et al.
0

Deep noise suppressors (DNS) have become an attractive solution to remove background noise, reverberation, and distortions from speech and are widely used in telephony/voice applications. They are also occasionally prone to introducing artifacts and lowering the perceptual quality of the speech. Subjective listening tests that use multiple human judges to derive a mean opinion score (MOS) are a popular way to measure these models' performance. Deep neural network based non-intrusive MOS estimation models have recently emerged as a popular cost-efficient alternative to these tests. These models are trained with only the MOS labels, often discarding the secondary statistics of the opinion scores. In this paper, we investigate several ways to integrate the distribution of opinion scores (e.g. variance, histogram information) to improve the MOS estimation performance. Our model is trained on a corpus of 419K denoised samples by 320 different DNS models and model variations and evaluated on 18K test samples from DNSMOS. We show that with very minor modification of a single task MOS estimation pipeline, these freely available labels can provide up to a 0.016 RMSE and 1

READ FULL TEXT
research
10/05/2021

DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

Human subjective evaluation is the gold standard to evaluate speech qual...
research
03/04/2019

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

We conduct an investigation on various hyper-parameters regarding neural...
research
12/04/2022

Speech MOS multi-task learning and rater bias correction

Perceptual speech quality is an important performance metric for telecon...
research
01/19/2022

Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis

This paper introduces Opencpop, a publicly available high-quality Mandar...
research
03/27/2019

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

We describe our development of CSS10, a collection of single speaker spe...
research
03/02/2022

Parameterized Image Quality Score Distribution Prediction

Recently, image quality has been generally describedby a mean opinion sc...
research
10/13/2021

Considering user agreement in learning to predict the aesthetic quality

How to robustly rank the aesthetic quality of given images has been a lo...

Please sign up or login with your details

Forgot password? Click here to reset