UFANS: U-shaped Fully-Parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster

11/28/2018
by   Dabiao Ma, et al.
0

Neural networks with Auto-regressive structures, such as Recurrent Neural Networks (RNNs), have become the most appealing structures for acoustic modeling of parametric text to speech synthesis (TTS) in ecent studies. Despite the prominent capacity to capture long-term dependency, these models consist of massive sequential computations that cannot be fully parallel. In this paper, we propose a U-shaped Fully-parallel Acoustic Neural Structure (UFANS), which is a deconvolutional alternative of RNNs for Statistical Parametric Speech Synthesis (SPSS). The experiments verify that our proposed model is over 20 times faster than RNN based acoustic model, both training and inference on GPU with comparable speech quality. Furthermore, We also investigate that how long information dependence really matters to synthesized speech quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2016

Recurrent Neural Network Postfilters for Statistical Parametric Speech Synthesis

In the last two years, there have been numerous papers that have looked ...
research
04/12/2019

RNN-based speech synthesis using a continuous sinusoidal model

Recently in statistical parametric speech synthesis, we proposed a conti...
research
06/20/2016

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Acoustic models based on long short-term memory recurrent neural network...
research
04/08/2019

GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram

Recent advances in neural network -based text-to-speech have reached hum...
research
08/31/2018

Self-Attention Linguistic-Acoustic Decoder

The conversion from text to speech relies on the accurate mapping from l...
research
10/24/2017

Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

This paper describes a novel text-to-speech (TTS) technique based on dee...
research
09/13/2023

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

Recent strides in neural speech synthesis technologies, while enjoying w...

Please sign up or login with your details

Forgot password? Click here to reset