Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

04/06/2020
by   Andis Draguns, et al.
0

Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced Neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while using significantly fewer parameters. We show how to combine Shuffle-Exchange network with convolutional layers establishing it as a useful building block in long sequence processing applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2019

Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

A key requirement in sequence to sequence processing is the modeling of ...
research
02/28/2017

ShaResNet: reducing residual network parameter number by sharing weights

Deep Residual Networks have reached the state of the art in many image p...
research
11/13/2019

Compressive Transformers for Long-Range Sequence Modelling

We present the Compressive Transformer, an attentive sequence model whic...
research
08/15/2018

A Simple but Hard-to-Beat Baseline for Session-based Recommendations

Convolutional Neural Networks (CNNs) models have been recently introduce...
research
10/31/2016

Neural Machine Translation in Linear Time

We present a novel neural network for processing sequences. The ByteNet ...
research
06/06/2018

Convolutional Sequence to Sequence Non-intrusive Load Monitoring

A convolutional sequence to sequence non-intrusive load monitoring model...
research
11/14/2019

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

Convolutional neural networks (CNNs) with dilated filters such as the Wa...

Please sign up or login with your details

Forgot password? Click here to reset