RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning

11/09/2016
by   Yan Duan, et al.
0

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL^2, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL^2 experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL^2 is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL^2 on a vision-based navigation task and show that it scales up to high-dimensional problems.

READ FULL TEXT
research
05/11/2023

On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm

We study a robust reinforcement learning (RL) with model uncertainty. Gi...
research
12/05/2019

Reinforcement Learning with Non-Markovian Rewards

The standard RL world model is that of a Markov Decision Process (MDP). ...
research
06/05/2018

Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning

Exogenous state variables and rewards can slow down reinforcement learni...
research
04/29/2021

What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

Recurrent meta reinforcement learning (meta-RL) agents are agents that e...
research
02/29/2012

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

The use of Reinforcement Learning in real-world scenarios is strongly li...
research
09/09/2020

Solving the k-sparse Eigenvalue Problem with Reinforcement Learning

We examine the possibility of using a reinforcement learning (RL) algori...

Please sign up or login with your details

Forgot password? Click here to reset