Supervised Pretraining Can Learn In-Context Reinforcement Learning

06/26/2023
by   Jonathan N. Lee, et al.
0

Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

Future-conditioned Unsupervised Pretraining for Decision Transformer

Recent research in offline reinforcement learning (RL) has demonstrated ...
research
06/26/2023

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

Pretrained transformers exhibit the remarkable ability of in-context lea...
research
11/23/2022

Masked Autoencoding for Scalable and Generalizable Decision Making

We are interested in learning scalable agents for reinforcement learning...
research
02/11/2022

Online Decision Transformer

Recent work has shown that offline reinforcement learning (RL) can be fo...
research
12/30/2022

Transformer in Transformer as Backbone for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) al...
research
06/27/2021

A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

Spatial Transformer Networks (STN) can generate geometric transformation...
research
05/16/2023

Cooperation Is All You Need

Going beyond 'dendritic democracy', we introduce a 'democracy of local p...

Please sign up or login with your details

Forgot password? Click here to reset