Amortized Context Vector Inference for Sequence-to-Sequence Networks

05/23/2018
by   Sotirios Chatzis, et al.
2

Neural attention (NA) is an effective mechanism for inferring complex structural data dependencies that span long temporal horizons. As a consequence, it has become a key component of sequence-to-sequence models that yield state-of-the-art performance in as hard tasks as abstractive document summarization (ADS), machine translation (MT), and video captioning (VC). NA mechanisms perform inference of context vectors; these constitute weighted sums of deterministic input sequence encodings, adaptively sourced over long temporal horizons. However, recent work in the field of amortized variational inference (AVI) has shown that it is often useful to treat the representations generated by deep networks as latent random variables. This allows for the models to better explore the space of possible representations. Based on this motivation, in this work we introduce a novel regard towards a popular NA mechanism, namely soft-attention (SA). Our approach treats the context vectors generated by SA models as latent variables, the posteriors of which are inferred by employing AVI. Both the means and the covariance matrices of the inferred posteriors are parameterized via deep network mechanisms similar to those employed in the context of standard SA. To illustrate our method, we implement it in the context of popular sequence-to-sequence model variants with SA. We conduct an extensive experimental evaluation using challenging ADS, VC, and MT benchmarks, and show how our approach compares to the baselines.

READ FULL TEXT

page 14

page 15

research
02/14/2022

Sequence-to-Sequence Resources for Catalan

In this work, we introduce sequence-to-sequence language resources for C...
research
08/02/2017

Deep Recurrent Generative Decoder for Abstractive Text Summarization

We propose a new framework for abstractive text summarization based on a...
research
12/21/2017

Variational Attention for Sequence-to-Sequence Models

The variational encoder-decoder (VED) encodes source information as a se...
research
09/17/2018

Quantum Statistics-Inspired Neural Attention

Sequence-to-sequence (encoder-decoder) models with attention constitute ...
research
06/25/2018

Prior Attention for Style-aware Sequence-to-Sequence Models

We extend sequence-to-sequence models with the possibility to control th...
research
04/26/2022

Flow-Adapter Architecture for Unsupervised Machine Translation

In this work, we propose a flow-adapter architecture for unsupervised NM...
research
09/04/2018

t-Exponential Memory Networks for Question-Answering Machines

Recent advances in deep learning have brought to the fore models that ca...

Please sign up or login with your details

Forgot password? Click here to reset