AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

11/28/2016
by   Brian Patton, et al.
0

Developers of text-to-speech synthesizers (TTS) often make use of human raters to assess the quality of synthesized speech. We demonstrate that we can model human raters' mean opinion scores (MOS) of synthesized speech using a deep recurrent neural network whose inputs consist solely of a raw waveform. Our best models provide utterance-level estimates of MOS only moderately inferior to sampled human ratings, as shown by Pearson and Spearman correlations. When multiple utterances are scored and averaged, a scenario common in synthesizer quality assessment, AutoMOS achieves correlations approaching those of human raters. The AutoMOS model has a number of applications, such as the ability to explore the parameter space of a speech synthesizer without requiring a human-in-the-loop.

READ FULL TEXT
research
08/09/2020

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

While deep learning has made impressive progress in speech synthesis and...
research
02/27/2021

MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network

Mean opinion score (MOS) is a popular subjective metric to assess the qu...
research
04/17/2019

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

Existing objective evaluation metrics for voice conversion (VC) are not ...
research
06/24/2022

Speech Quality Assessment through MOS using Non-Matching References

Human judgments obtained through Mean Opinion Scores (MOS) are the most ...
research
05/24/2023

PLCMOS – a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms

Speech quality assessment is a problem for every researcher working on m...
research
09/03/2020

Detection of AI-Synthesized Speech Using Cepstral Bispectral Statistics

Digital technology has made possible unimaginable applications come true...
research
07/15/2021

Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Human subjective evaluation is optimal to assess speech quality for huma...

Please sign up or login with your details

Forgot password? Click here to reset