Multiple Structural Priors Guided Self Attention Network for Language Understanding

12/29/2020
by   Le Qi, et al.
0

Self attention networks (SANs) have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, standard SANs are usually position-independent, and thus are incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on SANs for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into SAN models, proposing the Multiple Structural Priors Guided Self Attention Network (MS-SAN) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent hierarchical structure of the texts, we extract these information not only from the word contexts but also from the dependency syntax trees. Experimental results on two tasks show that MS-SAN achieves significant improvements against other strong baselines.

READ FULL TEXT
research
09/01/2019

Self-Attention with Structural Position Representations

Although self-attention networks (SANs) have advanced the state-of-the-a...
research
02/18/2019

Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping

Self-attention network, an attention-based feedforward neural network, h...
research
10/31/2018

Convolutional Self-Attention Network

Self-attention network (SAN) has recently attracted increasing interest ...
research
12/27/2020

SG-Net: Syntax Guided Transformer for Language Representation

Understanding human language is one of the key themes of artificial inte...
research
12/06/2017

Distance-based Self-Attention Network for Natural Language Inference

Attention mechanism has been used as an ancillary means to help RNN or C...
research
08/27/2018

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Recently, non-recurrent architectures (convolutional, self-attentional) ...
research
03/25/2021

Mask Attention Networks: Rethinking and Strengthen Transformer

Transformer is an attention-based neural network, which consists of two ...

Please sign up or login with your details

Forgot password? Click here to reset