Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis

03/29/2022
by   Kei Furukawa, et al.
0

End-to-end text-to-speech synthesis (TTS), which generates speech sounds directly from strings of texts or phonemes, has improved the quality of speech synthesis over the conventional TTS. However, most previous studies have been evaluated based on subjective naturalness and have not objectively examined whether they can reproduce pitch patterns of phonological phenomena such as downstep, rhythmic boost, and initial lowering that reflect syntactic structures in Japanese. These phenomena can be linguistically explained by phonological constraints and the syntaxx2013prosody mapping hypothesis (SPMH), which assumes projections from syntactic structures to phonological hierarchy. Although some experiments in psycholinguistics have verified the validity of the SPMH, it is crucial to investigate whether it can be implemented in TTS. To synthesize linguistic phenomena involving syntactic or phonological constraints, we propose a model using phonological symbols based on the SPMH and prosodic well-formedness constraints. Experimental results showed that the proposed method synthesized similar pitch patterns to those reported in linguistics experiments for the phenomena of initial lowering and rhythmic boost. The proposed model efficiently synthesizes phonological phenomena in the test data that were not explicitly included in the training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

In Mandarin text-to-speech (TTS) system, the front-end text processing m...
research
10/29/2018

Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention

Currently, there are increasing interests in text-to-speech (TTS) synthe...
research
10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...
research
05/20/2020

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce h...
research
04/09/2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

The end-to-end TTS, which can predict speech directly from a given seque...
research
08/20/2020

Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Despite the growing interest for expressive speech synthesis, synthesis ...
research
11/19/2020

Evaluation of investigational paradigms for the discovery of non-canonical astrophysical phenomena

Non-canonical phenomena - defined here as observables which are either i...

Please sign up or login with your details

Forgot password? Click here to reset