Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

10/29/2018
by   Bajibabu Bollepalli, et al.
0

Currently, there are increasing interests in text-to-speech (TTS) synthesis to use sequence-to-sequence models with attention. These models are end-to-end meaning that they learn both co-articulation and duration properties directly from text and speech. Since these models are entirely data-driven, they need large amounts of data to generate synthetic speech with good quality. However, in challenging speaking styles, such as Lombard speech, it is difficult to record sufficiently large speech corpora. Therefore, in this study we propose a transfer learning method to adapt a sequence-to-sequence based TTS system of normal speaking style to Lombard style. Moreover, we experiment with a WaveNet vocoder in synthesis of Lombard speech. We conducted subjective evaluations to assess the performance of the adapted TTS systems. The subjective evaluation results indicated that an adaptation system with the WaveNet vocoder clearly outperformed the conventional deep neural network based TTS system in synthesis of Lombard speech.

READ FULL TEXT
research
08/20/2020

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Despite the growing interest for expressive speech synthesis, synthesis ...
research
11/01/2017

Uncovering Latent Style Factors for Expressive Speech Synthesis

Prosodic modeling is a core problem in speech synthesis. The key challen...
research
09/23/2019

Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities

Modern sequence to sequence neural TTS systems provide close to natural ...
research
03/29/2022

Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis

End-to-end text-to-speech synthesis (TTS), which generates speech sounds...
research
10/31/2022

The Importance of Accurate Alignments in End-to-End Speech Synthesis

Unit selection synthesis systems required accurate segmentation and labe...
research
06/29/2020

Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis

Recent advances in deep learning methods have elevated synthetic speech ...
research
09/14/2020

Controllable neural text-to-speech synthesis using intuitive prosodic features

Modern neural text-to-speech (TTS) synthesis can generate speech that is...

Please sign up or login with your details

Forgot password? Click here to reset