Fusion of Self-supervised Learned Models for MOS Prediction

04/11/2022
by   Zhengdong Yang, et al.
0

We participated in the mean opinion score (MOS) prediction challenge, 2022. This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD). To improve the accuracy of the predicted scores, we have explored several model fusion-related strategies and proposed a fused framework in which seven pretrained self-supervised learned (SSL) models have been engaged. These pretrained SSL models are derived from three ASR frameworks, including Wav2Vec, Hubert, and WavLM. For the OOD track, we followed the 7 SSL models selected on the main track and adopted a semi-supervised learning method to exploit the unlabeled data. According to the official analysis results, our system has achieved 1st rank in 6 out of 16 metrics and is one of the top 3 systems for 13 out of 16 metrics. Specifically, we have achieved the highest LCC, SRCC, and KTAU scores at the system level on main track, as well as the best performance on the LCC, SRCC, and KTAU evaluation metrics at the utterance level on OOD track. Compared with the basic SSL models, the prediction accuracy of the fused system has been largely improved, especially on OOD sub-track.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...
research
04/08/2022

The Sillwood Technologies System for the VoiceMOS Challenge 2022

In this paper we describe our entry for the VoiceMOS Challenge 2022 for ...
research
03/21/2022

The VoiceMOS Challenge 2022

We present the first edition of the VoiceMOS Challenge, a scientific eve...
research
04/07/2022

DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

Mean opinion score (MOS) is a typical subjective evaluation metric for s...
research
04/07/2022

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Improving the user's hearing ability to understand speech in noisy envir...
research
01/17/2023

MooseNet: A trainable metric for synthesized speech with plda backend

We present MooseNet, a trainable speech metric that predicts listeners' ...
research
03/03/2021

University of Copenhagen Participation in TREC Health Misinformation Track 2020

In this paper, we describe our participation in the TREC Health Misinfor...

Please sign up or login with your details

Forgot password? Click here to reset