Simplifying and Understanding State Space Models with Diagonal Linear RNNs

12/01/2022
by   Ankit Gupta, et al.
0

Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs (DLR). We empirically show that DLR is as performant as previously-proposed SSMs in the presence of strong supervision, despite being conceptually much simpler. Moreover, we characterize the expressivity of SSMs (including DLR) and attention-based models via a suite of 13 synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via few convolutional kernels, they struggle on tasks requiring many such kernels and especially when the desired sequence manipulation is context-dependent. For example, DLR learns to perfectly shift a 0.5M-long input by an arbitrary number of positions but fails when the shift size depends on context. Despite these limitations, DLR reaches high performance on two higher-order reasoning tasks ListOpsSubTrees and PathfinderSegmentation-256 with input lengths 8K and 65K respectively, and gives encouraging performance on PathfinderSegmentation-512 with input length 262K for which attention is not a viable choice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2022

Diagonal State Spaces are as Effective as Structured State Spaces

Modeling long range dependencies in sequential data is a fundamental ste...
research
06/23/2022

On the Parameterization and Initialization of Diagonal State Space Models

State space models (SSM) have recently been shown to be very effective a...
research
09/26/2022

Liquid Structural State-Space Models

A proper parametrization of state transition matrices of linear state-sp...
research
06/27/2022

Long Range Language Modeling via Gated State Spaces

State space models have shown to be effective at modeling long range dep...
research
03/07/2023

Structured State Space Models for In-Context Reinforcement Learning

Structured state space sequence (S4) models have recently achieved state...
research
02/13/2023

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

State space models (SSMs) have high performance on long sequence modelin...
research
02/27/2023

Diagonal State Space Augmented Transformers for Speech Recognition

We improve on the popular conformer architecture by replacing the depthw...

Please sign up or login with your details

Forgot password? Click here to reset