Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

07/30/2020
by   Jae-Sung Bae, et al.
0

This paper proposes a controllable end-to-end text-to-speech (TTS) system to control the speaking speed (speed-controllable TTS; SCTTS) of synthesized speech with sentence-level speaking-rate value as an additional input. The speaking-rate value, the ratio of the number of input phonemes to the length of input speech, is adopted in the proposed system to control the speaking speed. Furthermore, the proposed SCTTS system can control the speaking speed while retaining other speech attributes, such as the pitch, by adopting the global style token-based style encoder. The proposed SCTTS does not require any additional well-trained model or an external speech database to extract phoneme-level duration information and can be trained in an end-to-end manner. In addition, our listening tests on fast-, normal-, and slow-speed speech showed that the SCTTS can generate more natural speech than other phoneme duration control approaches which increase or decrease duration at the same rate for the entire sentence, especially in the case of slow-speed speech.

READ FULL TEXT

page 3

page 4

research
06/05/2023

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Regressive Text-to-Speech (TTS) system utilizes attention mechanism to g...
research
05/22/2019

FastSpeech: Fast, Robust and Controllable Text to Speech

Neural network based end-to-end text to speech (TTS) has significantly i...
research
08/06/2020

PPSpeech: Phrase based Parallel End-to-End TTS System

Current end-to-end autoregressive TTS systems (e.g. Tacotron 2) have out...
research
03/23/2018

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

In this work, we propose "global style tokens" (GSTs), a bank of embeddi...
research
06/29/2020

Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis

Recent advances in deep learning methods have elevated synthetic speech ...
research
10/16/2018

Hierarchical Generative Modeling for Controllable Speech Synthesis

This paper proposes a neural end-to-end text-to-speech (TTS) model which...
research
10/29/2020

DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech

With the number of smart devices increasing, the demand for on-device te...

Please sign up or login with your details

Forgot password? Click here to reset