LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

02/08/2016
by   Marvin Coto-Jiménez, et al.
0

Recent developments in speech synthesis have produced systems capable of outcome intelligible speech, but now researchers strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. HMM-based Speech Synthesis is of great interest to many researchers, due to its ability to produce sophisticated features with small footprint. Despite such progress, its quality has not yet reached the level of the predominant unit-selection approaches that choose and concatenate recordings of real speech. Recent efforts have been made in the direction of improving these systems. In this paper we present the application of Long-Short Term Memory Deep Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order to obtain closer spectral characteristics to those of natural speech. The results show how HMM-voices could be improved using this approach.

READ FULL TEXT
research
07/01/2022

Building African Voices

Modern speech synthesis techniques can produce natural-sounding speech g...
research
04/22/2020

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

This paper presents a deep Gaussian process (DGP) model with a recurrent...
research
09/20/2018

LSTM-based Whisper Detection

This article presents a whisper speech detector in the far-field domain....
research
06/19/2021

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

Vocoders received renewed attention as main components in statistical pa...
research
02/22/2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

We propose two novel techniques --- stacking bottleneck features and min...
research
12/02/2019

Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Recent advances in Text-to-Speech (TTS) have improved quality and natura...
research
01/21/2016

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

The speech signal conveys information on different time scales from shor...

Please sign up or login with your details

Forgot password? Click here to reset