StARformer: Transformer with State-Action-Reward Representations

10/12/2021
by   Jinghuan Shang, et al.
0

Reinforcement Learning (RL) can be considered as a sequence modeling task, i.e., given a sequence of past state-action-reward experiences, a model autoregressively predicts a sequence of future actions. Recently, Transformers have been successfully adopted to model this problem. In this work, we propose State-Action-Reward Transformer (StARformer), which explicitly models local causal relations to help improve action prediction in long sequences. StARformer first extracts local representations (i.e., StAR-representations) from each group of state-action-reward tokens within a very short time span. A sequence of such local representations combined with state representations, is then used to make action predictions over a long time span. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on Atari (image) and Gym (state vector) benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs compared to the baseline. Our code is available at https://github.com/elicassion/StARformer.

READ FULL TEXT
research
03/07/2023

Graph Decision Transformer

Offline reinforcement learning (RL) is a challenging task, whose objecti...
research
10/16/2021

ASFormer: Transformer for Action Segmentation

Algorithms for the action segmentation task typically use temporal model...
research
06/03/2021

Reinforcement Learning as One Big Sequence Modeling Problem

Reinforcement learning (RL) is typically concerned with estimating singl...
research
12/12/2021

Tree-based Focused Web Crawling with Reinforcement Learning

A focused crawler aims at discovering as many web pages relevant to a ta...
research
08/20/2023

Karma: Adaptive Video Streaming via Causal Sequence Modeling

Optimal adaptive bitrate (ABR) decision depends on a comprehensive chara...
research
04/19/2020

Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization

Recent works in high-dimensional model-predictive control and model-base...
research
07/16/2017

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

Action anticipation aims to detect an action before it happens. Many rea...

Please sign up or login with your details

Forgot password? Click here to reset