Deep Learning Based Assessment of Synthetic Speech Naturalness

04/23/2021
by   Gabriel Mittag, et al.
0

In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such as from the Blizzard Challenge and the Voice Conversion Challenge. Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores. The proposed model is made publicly available and can, for example, be used to evaluate different TTS system configurations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2023

Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion

Electrolarynx is a commonly used assistive device to help patients with ...
research
11/02/2020

Perceptually Guided End-to-End Text-to-Speech

Several fast text-to-speech (TTS) models have been proposed for real-tim...
research
04/19/2021

NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets

In this paper, we present an update to the NISQA speech quality predicti...
research
03/31/2022

DeepFry: Identifying Vocal Fry Using Deep Neural Networks

Vocal fry or creaky voice refers to a voice quality characterized by irr...
research
04/23/2022

Improving Self-Supervised Learning-based MOS Prediction Networks

MOS (Mean Opinion Score) is a subjective method used for the evaluation ...
research
02/25/2020

Towards Learning a Universal Non-Semantic Representation of Speech

The ultimate goal of transfer learning is to reduce labeled data require...
research
12/12/2021

Visualising and Explaining Deep Learning Models for Speech Quality Prediction

Estimating quality of transmitted speech is known to be a non-trivial ta...

Please sign up or login with your details

Forgot password? Click here to reset