LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

10/18/2021
by   Wen-Chin Huang, et al.
4

An effective approach to automatically predict the subjective rating for synthetic speech is to train on a listening test dataset with human-annotated scores. Although each speech sample in the dataset is rated by several listeners, most previous works only used the mean score as the training target. In this work, we present LDNet, a unified framework for mean opinion score (MOS) prediction that predicts the listener-wise perceived quality given the input speech and the listener identity. We reflect recent advances in LD modeling, including design choices of the model architecture, and propose two inference methods that provide more stable results and efficient computation. We conduct systematic experiments on the voice conversion challenge (VCC) 2018 benchmark and a newly collected large-scale MOS dataset, providing an in-depth analysis of the proposed framework. Results show that the mean listener inference method is a better way to utilize the mean scores, whose effectiveness is more obvious when having more ratings per sample.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2021

MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network

Mean opinion score (MOS) is a popular subjective metric to assess the qu...
research
04/17/2019

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

Existing objective evaluation metrics for voice conversion (VC) are not ...
research
08/09/2020

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

While deep learning has made impressive progress in speech synthesis and...
research
04/20/2021

Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets

The ground truth used for training image, video, or speech quality predi...
research
06/28/2022

Comparison of Speech Representations for the MOS Prediction System

Automatic methods to predict Mean Opinion Score (MOS) of listeners have ...
research
06/18/2023

MOSPC: MOS Prediction Based on Pairwise Comparison

As a subjective metric to evaluate the quality of synthesized speech, Me...
research
03/21/2022

The VoiceMOS Challenge 2022

We present the first edition of the VoiceMOS Challenge, a scientific eve...

Please sign up or login with your details

Forgot password? Click here to reset