What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

by   Brooke Stephenson, et al.

In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this incremental policy on the evolution of the encoder representations of token n for different values of k (the lookahead parameter). The results show that, on average, tokens travel 88 representation with a one-word lookahead and 94 investigate which text features are the most influential on the evolution towards the final representation using a random forest analysis. The results show that the most salient factors are related to token length. We finally evaluate the effects of lookahead k at the decoder level, using a MUSHRA listening test. This test shows results that contrast with the above high figures: speech synthesis quality obtained with 2 word-lookahead is significantly lower than the one obtained with the full sentence.


Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

The prosody of a spoken word is determined by its surrounding context. I...

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

We present TokenSplit, a speech separation model that acts on discrete t...

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent ye...

Attending to Future Tokens For Bidirectional Sequence Generation

Neural sequence generation is typically performed token-by-token and lef...

Understanding Neural Abstractive Summarization Models via Uncertainty

An advantage of seq2seq abstractive summarization models is that they ge...

Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes

In sequence modeling tasks the token order matters, but this information...

How To Control Text Simplification? An Empirical Study of Control Tokens for Meaning Preserving Controlled Simplification

Text simplification rewrites text to be more readable for a specific aud...

Please sign up or login with your details

Forgot password? Click here to reset