A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

07/31/2020
by   Xuan Dong, et al.
0

The real-world capabilities of objective speech quality measures are limited since current measures (1) are developed from simulated data that does not adequately model real environments; or they (2) predict objective scores that are not always strongly correlated with subjective ratings. Additionally, a large dataset of real-world signals with listener quality ratings does not currently exist, which would help facilitate real-world assessment. In this paper, we collect and predict the perceptual quality of real-world speech signals that are evaluated by human listeners. We first collect a large quality rating dataset by conducting crowdsourced listening studies on two real-world corpora. We further develop a novel approach that predicts human quality ratings using a pyramid bidirectional long short term memory (pBLSTM) network with an attention mechanism. The results show that the proposed model achieves statistically lower estimation errors than prior assessment approaches, where the predicted scores strongly correlate with human judgments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2019

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

Existing objective evaluation metrics for voice conversion (VC) are not ...
research
10/05/2021

DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

Human subjective evaluation is the gold standard to evaluate speech qual...
research
11/10/2021

HASA-net: A non-intrusive hearing-aid speech assessment network

Without the need of a clean reference, non-intrusive speech assessment m...
research
09/16/2021

NORESQA – A Framework for Speech Quality Assessment using Non-Matching References

The perceptual task of speech quality assessment (SQA) is a challenging ...
research
11/09/2020

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

The calculation of most objective speech intelligibility assessment metr...
research
06/26/2023

Automatic Assessment of Divergent Thinking in Chinese Language with TransDis: A Transformer-Based Language Model Approach

Language models have been increasingly popular for automatic creativity ...
research
08/23/2017

Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence

Aesthetic quality prediction is a challenging task in the computer visio...

Please sign up or login with your details

Forgot password? Click here to reset