The Importance of Accurate Alignments in End-to-End Speech Synthesis

10/31/2022
by   Anusha Prakash, et al.
0

Unit selection synthesis systems required accurate segmentation and labeling of the speech signal owing to the concatenative nature. Hidden Markov model-based speech synthesis accommodates some transcription errors, but it was later shown that accurate transcriptions yield highly intelligible speech with smaller amounts of training data. With the arrival of end-to-end (E2E) systems, it was observed that very good quality speech could be synthesised with large amounts of data. As end-to-end synthesis progressed from Tacotron to FastSpeech2, it has become imminent that features that represent prosody are important for good-quality synthesis. In particular, durations of the sub-word units are important. Variants of FastSpeech use a teacher model or forced alignments to obtain good-quality synthesis. In this paper, we focus on duration prediction, using signal processing cues in tandem with forced alignment to produce accurate phone durations during training. The current work aims to highlight the importance of accurate alignments for good-quality synthesis. An attempt is made to train the E2E systems with accurately labeled data, and compare the same with approximately labeled data.

READ FULL TEXT
research
10/29/2018

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

End-to-end speech synthesis is a promising approach that directly conver...
research
08/01/2021

End to End Bangla Speech Synthesis

Text-to-Speech (TTS) system is a system where speech is synthesized from...
research
10/29/2018

Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Currently, there are increasing interests in text-to-speech (TTS) synthe...
research
05/26/2020

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

In recent years, statistical parametric speech synthesis (SPSS) systems ...
research
10/16/2020

Towards Online Steering of Flame Spray Pyrolysis Nanoparticle Synthesis

Flame Spray Pyrolysis (FSP) is a manufacturing technique to mass produce...
research
09/04/2019

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

In this paper, we present a generic and robust multimodal synthesis syst...
research
05/19/2020

Bayesian Subspace HMM for the Zerospeech 2020 Challenge

In this paper we describe our submission to the Zerospeech 2020 challeng...

Please sign up or login with your details

Forgot password? Click here to reset