DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

09/14/2017
by   Tao Shen, et al.
University of Technology Sydney
University of Washington
0

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02 Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/31/2018

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Many natural language processing tasks solely rely on sparse dependencie...
12/06/2017

Distance-based Self-Attention Network for Natural Language Inference

Attention mechanism has been used as an ancillary means to help RNN or C...
04/03/2018

Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling

Recurrent neural networks (RNN), convolutional neural networks (CNN) and...
06/07/2020

FMA-ETA: Estimating Travel Time Entirely Based on FFN With Attention

Estimated time of arrival (ETA) is one of the most important services in...
11/02/2019

FCEM: A Novel Fast Correlation Extract Model For Real Time Steganalysis of VoIP Stream via Multi-head Attention

Extracting correlation features between codes-words with high computatio...
08/12/2020

Text Classification based on Multi-granularity Attention Hybrid Neural Network

Neural network-based approaches have become the driven forces for Natura...
03/06/2022

CNN self-attention voice activity detector

In this work we present a novel single-channel Voice Activity Detector (...

Code Repositories

DiSAN

Code of Directional Self-Attention Network (DiSAN)


view repo

Please sign up or login with your details

Forgot password? Click here to reset