Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

11/24/1998
by   Orhan Karaali, et al.
0

This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares favorably with speech from existing commercial text-to-speech systems.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

11/24/1998

Speech Synthesis with Neural Networks

Text-to-speech conversion has traditionally been performed either by con...
10/12/2018

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

This paper introduces a deep neural network model for subband-based spee...
11/07/2020

Naturalization of Text by the Insertion of Pauses and Filler Words

In this article, we introduce a set of methods to naturalize text based ...
12/16/2017

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speec...
12/05/2019

Towards Robust Neural Vocoding for Speech Generation: A Survey

Recently, neural vocoders have been widely used in speech synthesis task...
08/31/2018

Self-Attention Linguistic-Acoustic Decoder

The conversion from text to speech relies on the accurate mapping from l...
07/26/2021

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

In this paper, we present UR-AIR system submission to the logical access...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.