Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

11/24/1998
by   Orhan Karaali, et al.
0

This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares favorably with speech from existing commercial text-to-speech systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/1998

Speech Synthesis with Neural Networks

Text-to-speech conversion has traditionally been performed either by con...
research
10/12/2018

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

This paper introduces a deep neural network model for subband-based spee...
research
11/07/2020

Naturalization of Text by the Insertion of Pauses and Filler Words

In this article, we introduce a set of methods to naturalize text based ...
research
12/16/2017

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speec...
research
12/05/2019

Towards Robust Neural Vocoding for Speech Generation: A Survey

Recently, neural vocoders have been widely used in speech synthesis task...
research
02/25/2017

Deep Voice: Real-time Neural Text-to-Speech

We present Deep Voice, a production-quality text-to-speech system constr...
research
05/18/2020

A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Recently, the effectiveness of text-to-speech (TTS) systems combined wit...

Please sign up or login with your details

Forgot password? Click here to reset