Decision S4: Efficient Sequence-Based RL via State Spaces Layers

06/08/2023
by   Shmuel Bar-David, et al.
0

Recently, sequence learning methods have been applied to the problem of off-policy Reinforcement Learning, including the seminal work on Decision Transformers, which employs transformers for this task. Since transformers are parameter-heavy, cannot benefit from history longer than a fixed window size, and are not computed using recurrence, we set out to investigate the suitability of the S4 family of models, which are based on state-space layers and have been shown to outperform transformers, especially in modeling long-range dependencies. In this work we present two main algorithms: (i) an off-policy training procedure that works with trajectories, while still maintaining the training efficiency of the S4 model. (ii) An on-policy training procedure that is trained in a recurrent manner, benefits from long-range dependencies, and is based on a novel stable actor-critic mechanism. Our results indicate that our method outperforms multiple variants of decision transformers, as well as the other baseline methods on most tasks, while reducing the latency, number of parameters, and training time by several orders of magnitude, making our approach more suitable for real-world RL.

READ FULL TEXT
research
10/31/2021

Efficiently Modeling Long Sequences with Structured State Spaces

A central goal of sequence modeling is designing a single principled mod...
research
12/28/2022

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

State space models (SSMs) have demonstrated state-of-the-art sequence mo...
research
11/13/2019

Compressive Transformers for Long-Range Sequence Modelling

We present the Compressive Transformer, an attentive sequence model whic...
research
04/17/2019

Dynamic Evaluation of Transformer Language Models

This research note combines two methods that have recently improved the ...
research
03/07/2023

Structured State Space Models for In-Context Reinforcement Learning

Structured state space sequence (S4) models have recently achieved state...
research
04/12/2021

Updater-Extractor Architecture for Inductive World State Representations

Developing NLP models traditionally involves two stages - training and a...
research
05/16/2023

Cooperation Is All You Need

Going beyond 'dendritic democracy', we introduce a 'democracy of local p...

Please sign up or login with your details

Forgot password? Click here to reset