Improving Self-Supervised Learning-based MOS Prediction Networks

04/23/2022
by   Bálint Gyires-Tóth, et al.
0

MOS (Mean Opinion Score) is a subjective method used for the evaluation of a system's quality. Telecommunications (for voice and video), and speech synthesis systems (for generated speech) are a few of the many applications of the method. While MOS tests are widely accepted, they are time-consuming and costly since human input is required. In addition, since the systems and subjects of the tests differ, the results are not really comparable. On the other hand, a large number of previous tests allow us to train machine learning models that are capable of predicting MOS value. By automatically predicting MOS values, both the aforementioned issues can be resolved. The present work introduces data-, training- and post-training specific improvements to a previous self-supervised learning-based MOS prediction model. We used a wav2vec 2.0 model pre-trained on LibriSpeech, extended with LSTM and non-linear dense layers. We introduced transfer learning, target data preprocessing a two- and three-phase training method with different batch formulations, dropout accumulation (for larger batch sizes) and quantization of the predictions. The methods are evaluated using the shared synthetic speech dataset of the first Voice MOS challenge.

READ FULL TEXT
research
04/07/2022

DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

Mean opinion score (MOS) is a typical subjective evaluation metric for s...
research
04/23/2021

Deep Learning Based Assessment of Synthetic Speech Naturalness

In this paper, we present a new objective prediction model for synthetic...
research
04/24/2023

Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models

Singing voice transcription converts recorded singing audio to musical n...
research
04/05/2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...
research
04/07/2021

Utilizing Self-supervised Representations for MOS Prediction

Speech quality assessment has been a critical issue in speech processing...
research
03/04/2019

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

We conduct an investigation on various hyper-parameters regarding neural...
research
08/31/2023

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the...

Please sign up or login with your details

Forgot password? Click here to reset