Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

07/18/2019
by   Karlis Freivalds, et al.
3

A key requirement in sequence to sequence processing is the modeling of long range dependencies. To this end, a vast majority of the state-of-the-art models use attention mechanism which is of O(n^2) complexity that leads to slow execution for long sequences. We introduce a new Shuffle-Exchange neural network model for sequence to sequence tasks which have O(log n) depth and O(n log n) total complexity. We show that this model is powerful enough to infer efficient algorithms for common algorithmic benchmarks including sorting, addition and multiplication. We evaluate our architecture on the challenging LAMBADA question answering dataset and compare it with the state-of-the-art models which use attention. Our model achieves competitive accuracy and scales to sequences with more than a hundred thousand of elements. We are confident that the proposed model has the potential for building more efficient architectures for processing large interrelated data in language modeling, music generation and other application domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2020

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Attention is a commonly used mechanism in sequence processing, but it is...
research
06/29/2020

Switchblade – a Neural Network for Hard 2D Tasks

Convolutional neural networks have become the main tools for processing ...
research
09/13/2020

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the...
research
11/17/2019

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

In sequence to sequence learning, the self-attention mechanism proves to...
research
09/21/2022

Mega: Moving Average Equipped Gated Attention

The design choices in the Transformer attention mechanism, including wea...
research
04/25/2023

State Spaces Aren't Enough: Machine Translation Needs Attention

Structured State Spaces for Sequences (S4) is a recently proposed sequen...
research
02/26/2020

Sparse Sinkhorn Attention

We propose Sparse Sinkhorn Attention, a new efficient and sparse method ...

Please sign up or login with your details

Forgot password? Click here to reset