Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

10/24/2017
by   Hideyuki Tachibana, et al.
0

This paper describes a novel text-to-speech (TTS) technique based on deep convolutional neural networks (CNN), without any recurrent units. Recurrent neural network (RNN) has been a standard technique to model sequential data recently, and this technique has been used in some cutting-edge neural TTS techniques. However, training RNN component often requires a very powerful computer, or very long time typically several days or weeks. Recent other studies, on the other hand, have shown that CNN-based sequence synthesis can be much faster than RNN-based techniques, because of high parallelizability. The objective of this paper is to show an alternative neural TTS system, based only on CNN, that can alleviate these economic costs of training. In our experiment, the proposed Deep Convolutional TTS can be sufficiently trained only in a night (15 hours), using an ordinary gaming PC equipped with two GPUs, while the quality of the synthesized speech was almost acceptable.

READ FULL TEXT
research
05/03/2015

ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks

In this paper, we propose a deep neural network architecture for object ...
research
06/22/2016

Learning text representation using recurrent convolutional neural network with highway layers

Recently, the rapid development of word embedding and neural networks ha...
research
11/22/2016

Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

A deep learning approach has been widely applied in sequence modeling pr...
research
03/05/2015

What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

We present a novel method for aligning a sequence of instructions to a v...
research
12/12/2021

Visualising and Explaining Deep Learning Models for Speech Quality Prediction

Estimating quality of transmitted speech is known to be a non-trivial ta...
research
03/28/2016

Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation

Despite the recent advances in automatically describing image contents, ...
research
11/28/2018

UFANS: U-shaped Fully-Parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster

Neural networks with Auto-regressive structures, such as Recurrent Neura...

Please sign up or login with your details

Forgot password? Click here to reset