Semantics-aware Attention Improves Neural Machine Translation

10/13/2021
by   Aviv Slobodkin, et al.
0

The integration of syntactic structures into Transformer machine translation has shown positive results, but to our knowledge, no work has attempted to do so with semantic structures. In this work we propose two novel parameter-free methods for injecting semantic information into Transformers, both rely on semantics-aware masking of (some of) the attention heads. One such method operates on the encoder, through a Scene-Aware Self-Attention (SASA) head. Another on the decoder, through a Scene-Aware Cross-Attention (SACrA) head. We show a consistent improvement over the vanilla Transformer and syntax-aware models for four language pairs. We further show an additional gain when using both semantic and syntactic structures in some language pairs.

READ FULL TEXT
research
09/06/2019

Improving Neural Machine Translation with Parent-Scaled Self-Attention

Most neural machine translation (NMT) models operate on source and targe...
research
10/21/2022

Syntax-guided Localized Self-attention by Constituency Syntactic Distance

Recent works have revealed that Transformers are implicitly learning the...
research
07/08/2019

An Intrinsic Nearest Neighbor Analysis of Neural Machine Translation Architectures

Earlier approaches indirectly studied the information captured by the hi...
research
09/21/2020

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Recent studies show that the attention heads in Transformer are not equa...
research
06/05/2019

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

We inspect the multi-head self-attention in Transformer NMT encoders for...
research
05/02/2020

Hard-Coded Gaussian Attention for Neural Machine Translation

Recent work has questioned the importance of the Transformer's multi-hea...
research
05/12/2022

Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs

We propose a generative model for text generation, which exhibits disent...

Please sign up or login with your details

Forgot password? Click here to reset