Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification

07/16/2020
by   Yeunju Choi, et al.
0

Several papers have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to ensure the generalization ability of the models. In this paper, we propose a method using multi-task learning (MTL) with spoofing detection (SD) and spoofing type classification (STC) to improve the generalization ability of a MOS prediction model. Besides, we use the focal loss to maximize the synergy between SD and STC for MOS prediction. Experiments using the results of the Voice Conversion Challenge 2018 show that proposed MTL with two auxiliary tasks improves MOS prediction.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset