Transformer-based World Models Are Happy With 100k Interactions

03/13/2023
by   Jan Robine, et al.
0

Deep neural networks have been successful in many reinforcement learning settings. However, compared to human learners they are overly data hungry. To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state. By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient. Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark.

READ FULL TEXT

page 1

page 7

page 14

page 17

page 18

research
06/02/2021

Decision Transformer: Reinforcement Learning via Sequence Modeling

We present a framework that abstracts Reinforcement Learning (RL) as a s...
research
06/02/2022

Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

Real-world reinforcement learning tasks often involve some form of parti...
research
05/26/2023

Emergent Agentic Transformer from Chain of Hindsight Experience

Large transformer models powered by diverse data and model scale have do...
research
04/18/2020

Modeling Survival in model-based Reinforcement Learning

Although recent model-free reinforcement learning algorithms have been s...
research
10/10/2021

DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

In this paper, we propose Dynamic Compressive Transformer (DCT), a trans...
research
09/01/2022

Transformers are Sample Efficient World Models

Deep reinforcement learning agents are notoriously sample inefficient, w...
research
09/20/2022

Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

The cooperative Multi-A gent R einforcement Learning (MARL) with permuta...

Please sign up or login with your details

Forgot password? Click here to reset