FeatherTTS: Robust and Efficient attention based Neural TTS

11/02/2020
by   Qiao Tian, et al.
0

Attention based neural TTS is elegant speech synthesis pipeline and has shown a powerful ability to generate natural speech. However, it is still not robust enough to meet the stability requirements for industrial products. Besides, it suffers from slow inference speed owning to the autoregressive generation process. In this work, we propose FeatherTTS, a robust and efficient attention-based neural TTS system. Firstly, we propose a novel Gaussian attention which utilizes interpretability of Gaussian attention and the strict monotonic property in TTS. By this method, we replace the commonly used stop token prediction architecture with attentive stop prediction. Secondly, we apply block sparsity on the autoregressive decoder to speed up speech synthesis. The experimental results show that our proposed FeatherTTS not only nearly eliminates the problem of word skipping, repeating in particularly hard texts and keep the naturalness of generated speech, but also speeds up acoustic feature generation by 3.5 times over Tacotron. Overall, the proposed FeatherTTS can be 35x faster than real-time on a single CPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2021

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

This paper presents an end-to-end text-to-speech system with low latency...
research
06/03/2019

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Neural TTS has demonstrated strong capabilities to generate human-like s...
research
10/20/2017

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

We present Deep Voice 3, a fully-convolutional attention-based neural te...
research
07/18/2018

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

This paper proposes a forward attention method for the sequenceto- seque...
research
10/20/2017

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

We present Deep Voice 3, a fully-convolutional attention-based neural te...
research
08/23/2021

One TTS Alignment To Rule Them All

Speech-to-text alignment is a critical component of neural textto-speech...
research
08/31/2018

Self-Attention Linguistic-Acoustic Decoder

The conversion from text to speech relies on the accurate mapping from l...

Please sign up or login with your details

Forgot password? Click here to reset