Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

01/31/2018
by   Tao Shen, et al.
0

Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2017

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are wide...
research
12/06/2017

Distance-based Self-Attention Network for Natural Language Inference

Attention mechanism has been used as an ancillary means to help RNN or C...
research
05/24/2021

Self-Attention Networks Can Process Bounded Hierarchical Languages

Despite their impressive performance in NLP, self-attention networks wer...
research
08/24/2019

Enhancing Neural Sequence Labeling with Position-Aware Self-Attention

Sequence labeling is a fundamental task in natural language processing a...
research
06/28/2020

Self-Attention Networks for Intent Detection

Self-attention networks (SAN) have shown promising performance in variou...
research
05/18/2021

Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Existing video polyp segmentation (VPS) models typically employ convolut...
research
05/10/2018

Obligation and Prohibition Extraction Using Hierarchical RNNs

We consider the task of detecting contractual obligations and prohibitio...

Please sign up or login with your details

Forgot password? Click here to reset