Forward-Backward Decoding for Regularizing End-to-End TTS

07/18/2019
by   Yibin Zheng, et al.
0

Neural end-to-end TTS can generate very high-quality synthesized speech, and even close to human recording within similar domain text. However, it performs unsatisfactory when scaling it to challenging test sets. One concern is that the encoder-decoder with attention-based network adopts autoregressive generative sequence model with the limitation of "exposure bias" To address this issue, we propose two novel methods, which learn to predict future by improving agreement between forward and backward decoding sequence. The first one is achieved by introducing divergence regularization terms into model training objective to reduce the mismatch between two directional models, namely L2R and R2L (which generates targets from left-to-right and right-to-left, respectively). While the second one operates on decoder-level and exploits the future information during decoding. In addition, we employ a joint training strategy to allow forward and backward decoding to improve each other in an interactive process. Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0.14 in a challenging test, and achieving close to human quality (4.42 vs. 4.49 in MOS) on general test.

READ FULL TEXT
research
08/13/2018

Regularizing Neural Machine Translation by Target-bidirectional Agreement

Although Neural Machine Translation (NMT) has achieved remarkable progre...
research
06/23/2019

Sequence Generation: From Both Sides to the Middle

The encoder-decoder framework has achieved promising process for many se...
research
09/18/2018

Bidirectional Attentional Encoder-Decoder Model and Bidirectional Beam Search for Abstractive Summarization

Sequence generative models with RNN variants, such as LSTM, GRU, show pr...
research
01/31/2020

Pseudo-Bidirectional Decoding for Local Sequence Transduction

Local sequence transduction (LST) tasks are sequence transduction tasks ...
research
06/15/2020

Regularized Forward-Backward Decoder for Attention Models

Nowadays, attention models are one of the popular candidates for speech ...
research
05/21/2023

A Framework for Bidirectional Decoding: Case Study in Morphological Inflection

Transformer-based encoder-decoder models that generate outputs in a left...
research
10/28/2018

Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are chal...

Please sign up or login with your details

Forgot password? Click here to reset