DeepAI
Log In Sign Up

Deep Learning Based Assessment of Synthetic Speech Naturalness

04/23/2021
by   Gabriel Mittag, et al.
0

In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such as from the Blizzard Challenge and the Voice Conversion Challenge. Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores. The proposed model is made publicly available and can, for example, be used to evaluate different TTS system configurations.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/02/2020

Perceptually Guided End-to-End Text-to-Speech

Several fast text-to-speech (TTS) models have been proposed for real-tim...
04/19/2021

NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets

In this paper, we present an update to the NISQA speech quality predicti...
03/31/2022

DeepFry: Identifying Vocal Fry Using Deep Neural Networks

Vocal fry or creaky voice refers to a voice quality characterized by irr...
04/23/2022

Improving Self-Supervised Learning-based MOS Prediction Networks

MOS (Mean Opinion Score) is a subjective method used for the evaluation ...
02/25/2020

Towards Learning a Universal Non-Semantic Representation of Speech

The ultimate goal of transfer learning is to reduce labeled data require...
12/12/2021

Visualising and Explaining Deep Learning Models for Speech Quality Prediction

Estimating quality of transmitted speech is known to be a non-trivial ta...
05/03/2021

Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

In this paper, we present a full-reference speech quality prediction mod...

Code Repositories

NISQA

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment


view repo