Quantum Statistics-Inspired Neural Attention

09/17/2018
by   Aristotelis Charalampous, et al.
8

Sequence-to-sequence (encoder-decoder) models with attention constitute a cornerstone of deep learning research, as they have enabled unprecedented sequential data modeling capabilities. This effectiveness largely stems from the capacity of these models to infer salient temporal dynamics over long horizons; these are encoded into the obtained neural attention (NA) distributions. However, existing NA formulations essentially constitute point-wise selection mechanisms over the observed source sequences; that is, attention weights computation relies on the assumption that each source sequence element is independent of the rest. Unfortunately, although convenient, this assumption fails to account for higher-order dependencies which might be prevalent in real-world data. This paper addresses these limitations by leveraging Quantum-Statistical modeling arguments. Specifically, our work broadens the notion of NA, by attempting to account for the case that the NA model becomes inherently incapable of discerning between individual source elements; this is assumed to be the case due to higher-order temporal dynamics. On the contrary, we postulate that in some cases selection may be feasible only at the level of pairs of source sequence elements. To this end, we cast NA into inference of an attention density matrix (ADM) approximation. We derive effective training and inference algorithms, and evaluate our approach in the context of a machine translation (MT) application. We perform experiments with challenging benchmark datasets. As we show, our approach yields favorable outcomes in terms of several evaluation metrics.

READ FULL TEXT
research
05/16/2020

Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning

In sequence-to-sequence learning, the attention mechanism has been a gre...
research
08/11/2018

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Current state-of-the-art machine translation systems are based on encode...
research
05/23/2018

Amortized Context Vector Inference for Sequence-to-Sequence Networks

Neural attention (NA) is an effective mechanism for inferring complex st...
research
09/14/2017

Global-Context Neural Machine Translation through Target-Side Attentive Residual Connections

Neural sequence-to-sequence models achieve remarkable performance not on...
research
09/12/2018

Closed-Book Training to Improve Summarization Encoder Memory

A good neural sequence-to-sequence summarization model should have a str...
research
09/04/2018

t-Exponential Memory Networks for Question-Answering Machines

Recent advances in deep learning have brought to the fore models that ca...

Please sign up or login with your details

Forgot password? Click here to reset