Mask Attention Networks: Rethinking and Strengthen Transformer

03/25/2021
by   Zhihao Fan, et al.
0

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2018

Accelerating Neural Transformer via an Average Attention Network

With parallelizable attention networks, the neural Transformer is very f...
research
12/06/2017

Distance-based Self-Attention Network for Natural Language Inference

Attention mechanism has been used as an ancillary means to help RNN or C...
research
11/22/2018

Mask R-CNN with Pyramid Attention Network for Scene Text Detection

In this paper, we present a new Mask R-CNN based text detection approach...
research
12/27/2020

SG-Net: Syntax Guided Transformer for Language Representation

Understanding human language is one of the key themes of artificial inte...
research
01/03/2021

An Efficient Transformer Decoder with Compressed Sub-layers

The large attention-based encoder-decoder network (Transformer) has beco...
research
04/28/2023

MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition

Weather recognition is an essential support for many practical life appl...
research
12/29/2020

Multiple Structural Priors Guided Self Attention Network for Language Understanding

Self attention networks (SANs) have been widely utilized in recent NLP s...

Please sign up or login with your details

Forgot password? Click here to reset