In-context Reinforcement Learning with Algorithm Distillation

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2022

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

We present Multiple Scenario Verifiable Reinforcement Learning via Polic...
research
12/30/2021

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

Online reinforcement learning (RL) algorithms are often difficult to dep...
research
02/14/2022

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

We consider a context-dependent Reinforcement Learning (RL) setting, whi...
research
06/18/2021

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environme...
research
04/16/2020

Analyzing Reinforcement Learning Benchmarks with Random Weight Guessing

We propose a novel method for analyzing and visualizing the complexity o...
research
11/03/2022

Synthesis of separation processes with reinforcement learning

This paper shows the implementation of reinforcement learning (RL) in co...
research
03/29/2022

Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

In this article, for the first time, we propose a transformer network-ba...

Please sign up or login with your details

Forgot password? Click here to reset