AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

03/04/2020
by   Zhen Zeng, et al.
0

Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a sequence of characters, and the duration of each character is determined by a duration predictor.Instead of adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment loss is presented to consider all possible alignments in training by use of dynamic programming. Experiments on the LJSpeech dataset show that our model achieves not only state-of-the-art performance which outperforms Transformer TTS by 0.03 in mean option score (MOS), but also a high efficiency which is more than 50 times faster than real-time.

READ FULL TEXT
research
05/15/2020

JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

We propose Jointly trained Duration Informed Transformer (JDI-T), a feed...
research
08/09/2021

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Neural painting refers to the procedure of producing a series of strokes...
research
10/22/2019

Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer

We propose a sequence-to-sequence singing synthesizer, which avoids the ...
research
09/19/2018

Close to Human Quality TTS with Transformer

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotro...
research
06/05/2020

End-to-End Adversarial Text-to-Speech

Modern text-to-speech synthesis pipelines typically involve multiple pro...
research
02/12/2020

GLU Variants Improve Transformer

Gated Linear Units (arXiv:1612.08083) consist of the component-wise prod...
research
07/09/2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

In this paper, we develop DeepSinger, a multi-lingual multi-singer singi...

Please sign up or login with your details

Forgot password? Click here to reset