Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling

04/03/2018
by   Tao Shen, et al.
0

Recurrent neural networks (RNN), convolutional neural networks (CNN) and self-attention networks (SAN) are commonly used to produce context-aware representations. RNN can capture long-range dependency but is hard to parallelize and not time-efficient. CNN focuses on local dependency but does not perform well on some tasks. SAN can model both such dependencies via highly parallelizable computation, but memory requirement grows rapidly in line with sequence length. In this paper, we propose a model, called "bi-directional block self-attention network (Bi-BloSAN)", for RNN/CNN-free sequence encoding. It requires as little memory as RNN but with all the merits of SAN. Bi-BloSAN splits the entire sequence into blocks, and applies an intra-block SAN to each block for modeling local context, then applies an inter-block SAN to the outputs for all blocks to capture long-range dependency. Thus, each SAN only needs to process a short sequence, and only a small amount of memory is required. Additionally, we use feature-level attention to handle the variation of contexts around the same word, and use forward/backward masks to encode temporal order information. On nine benchmark datasets for different NLP tasks, Bi-BloSAN achieves or improves upon state-of-the-art accuracy, and shows better efficiency-memory trade-off than existing RNN/CNN/SAN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2017

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are wide...
research
12/14/2021

Explore Long-Range Context feature for Speaker Verification

Capturing long-range dependency and modeling long temporal contexts is p...
research
02/26/2021

Nested-block self-attention for robust radiotherapy planning segmentation

Although deep convolutional networks have been widely studied for head a...
research
09/15/2023

Increasing diversity of omni-directional images generated from single image using cGAN based on MLPMixer

This paper proposes a novel approach to generating omni-directional imag...
research
02/13/2023

Bi-directional Masks for Efficient N:M Sparse Training

We focus on addressing the dense backward propagation issue for training...
research
12/29/2022

Efficient Movie Scene Detection using State-Space Transformers

The ability to distinguish between different movie scenes is critical fo...
research
06/16/2021

Invertible Attention

Attention has been proved to be an efficient mechanism to capture long-r...

Please sign up or login with your details

Forgot password? Click here to reset