Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

06/05/2023
by   Dengfeng Ke, et al.
2

Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skipping, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate natural speech while ensuring robustness. In this study, we propose Rhythm-controllable Attention (RC-Attention) based on Tracotron2, which improves robustness and naturalness simultaneously. Proposed attention adopts a trainable scalar learned from four kinds of information to achieve rhythm control, which makes rhythm control more robust and natural, even when synthesized sentences are extremely longer than training corpus. We use word errors counting and AB preference test to measure robustness of proposed method and naturalness of synthesized speech, respectively. Results shows that RC-Attention has the lowest word error rate of nearly 0.6 with 11.8 speech synthesized with RC-Attention to that with Forward Attention, because the former has more natural rhythm.

READ FULL TEXT
research
07/30/2020

Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

This paper proposes a controllable end-to-end text-to-speech (TTS) syste...
research
01/30/2021

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

In this work, a robust and efficient text-to-speech system, named Triple...
research
10/08/2020

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling

This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-...
research
04/12/2022

Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

The recently developed pitch-controllable text-to-speech (TTS) model, i....
research
08/07/2020

Controllable Neural Prosody Synthesis

Speech synthesis has recently seen significant improvements in fidelity,...
research
08/13/2020

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Recent neural speech synthesis systems have gradually focused on the con...
research
09/23/2019

Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities

Modern sequence to sequence neural TTS systems provide close to natural ...

Please sign up or login with your details

Forgot password? Click here to reset