The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach

10/14/2019
by   Noé Tits, et al.
0

As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, psychology. In this Chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the Chapter intends to assemble the different aspects of the theory and summarize the concepts.

READ FULL TEXT

page 4

page 12

research
08/20/2020

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Despite the growing interest for expressive speech synthesis, synthesis ...
research
03/27/2019

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

The field of Text-to-Speech has experienced huge improvements last years...
research
07/28/2018

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Output from statistical parametric speech synthesis (SPSS) remains notic...
research
11/28/2019

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

We propose a Text-to-Speech method to create an unseen expressive style ...
research
06/19/2021

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

Vocoders received renewed attention as main components in statistical pa...
research
10/06/2022

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

Speech is the fundamental mode of human communication, and its synthesis...
research
08/03/2020

Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis

Attention-based seq2seq text-to-speech systems, especially those use sel...

Please sign up or login with your details

Forgot password? Click here to reset