What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

09/04/2020
by   Brooke Stephenson, et al.
0

In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this incremental policy on the evolution of the encoder representations of token n for different values of k (the lookahead parameter). The results show that, on average, tokens travel 88 representation with a one-word lookahead and 94 investigate which text features are the most influential on the evolution towards the final representation using a random forest analysis. The results show that the most salient factors are related to token length. We finally evaluate the effects of lookahead k at the decoder level, using a MUSHRA listening test. This test shows results that contrast with the above high figures: speech synthesis quality obtained with 2 word-lookahead is significantly lower than the one obtained with the full sentence.

READ FULL TEXT
research
02/19/2021

Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

The prosody of a spoken word is determined by its surrounding context. I...
research
08/21/2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

We present TokenSplit, a speech separation model that acts on discrete t...
research
11/07/2019

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent ye...
research
08/16/2019

Attending to Future Tokens For Bidirectional Sequence Generation

Neural sequence generation is typically performed token-by-token and lef...
research
10/15/2020

Understanding Neural Abstractive Summarization Models via Uncertainty

An advantage of seq2seq abstractive summarization models is that they ge...
research
09/18/2019

Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes

In sequence modeling tasks the token order matters, but this information...
research
05/24/2023

How To Control Text Simplification? An Empirical Study of Control Tokens for Meaning Preserving Controlled Simplification

Text simplification rewrites text to be more readable for a specific aud...

Please sign up or login with your details

Forgot password? Click here to reset