Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation

07/13/2022
by   Jinyi Hu, et al.
15

The past several years have witnessed Variational Auto-Encoder's superiority in various text generation tasks. However, due to the sequential nature of the text, auto-regressive decoders tend to ignore latent variables and then reduce to simple language models, known as the KL vanishing problem, which would further deteriorate when VAE is combined with Transformer-based structures. To ameliorate this problem, we propose DELLA, a novel variational Transformer framework. DELLA learns a series of layer-wise latent variables with each inferred from those of lower layers and tightly coupled with the hidden states by low-rank tensor product. In this way, DELLA forces these posterior latent variables to be fused deeply with the whole computation path and hence incorporate more information. We theoretically demonstrate that our method can be regarded as entangling latent variables to avoid posterior information decrease through layers, enabling DELLA to get higher non-zero KL values even without any annealing or thresholding tricks. Experiments on four unconditional and three conditional generation tasks show that DELLA could better alleviate KL vanishing and improve both quality and diversity compared to several strong baselines.

READ FULL TEXT

page 18

page 19

research
10/22/2022

Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation

Variational Auto-Encoder (VAE) has been widely adopted in text generatio...
research
05/24/2019

mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation

It has been previously observed that training Variational Recurrent Auto...
research
03/26/2019

Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder

Diversity plays a vital role in many text generating applications. In re...
research
02/06/2018

Improving Variational Encoder-Decoders in Dialogue Generation

Variational encoder-decoders (VEDs) have shown promising results in dial...
research
04/04/2019

Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling

Recurrent Variational Autoencoder has been widely used for language mode...
research
05/18/2017

Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions

The key idea of variational auto-encoders (VAEs) resembles that of tradi...
research
03/25/2019

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

Variational autoencoders (VAEs) with an auto-regressive decoder have bee...

Please sign up or login with your details

Forgot password? Click here to reset