Analysing Shortcomings of Statistical Parametric Speech Synthesis

07/28/2018
by   Gustav Eje Henter, et al.
0

Output from statistical parametric speech synthesis (SPSS) remains noticeably worse than natural speech recordings in terms of quality, naturalness, speaker similarity, and intelligibility in noise. There are many hypotheses regarding the origins of these shortcomings, but these hypotheses are often kept vague and presented without empirical evidence that could confirm and quantify how a specific shortcoming contributes to imperfections in the synthesised speech. Throughout speech synthesis literature, surprisingly little work is dedicated towards identifying the perceptually most important problems in speech synthesis, even though such knowledge would be of great value for creating better SPSS systems. In this book chapter, we analyse some of the shortcomings of SPSS. In particular, we discuss issues with vocoding and present a general methodology for quantifying the effect of any of the many assumptions and design choices that hold SPSS back. The methodology is accompanied by an example that carefully measures and compares the severity of perceptual limitations imposed by vocoding as well as other factors such as the statistical model and its use.

READ FULL TEXT
research
11/08/2018

Speaker-adaptive neural vocoders for statistical parametric speech synthesis systems

This paper proposes speaker-adaptive neural vocoders for statistical par...
research
10/14/2019

The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach

As part of the Human-Computer Interaction field, Expressive speech synth...
research
01/02/2020

Eigenresiduals for improved Parametric Speech Synthesis

Statistical parametric speech synthesizers have recently shown their abi...
research
05/26/2020

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

In recent years, statistical parametric speech synthesis (SPSS) systems ...
research
11/09/2018

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

This paper proposes a WaveNet-based neural excitation model (ExcitNet) f...
research
02/18/2015

F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief Network

In recent years multilayer perceptrons (MLPs) with many hid- den layers ...

Please sign up or login with your details

Forgot password? Click here to reset