Backplay: "Man muss immer umkehren"

07/18/2018
by   Cinjon Resnick, et al.
2

A long-standing problem in model free reinforcement learning (RL) is that it requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to increase the sample efficiency of RL when we have access to demonstrations. Our approach, which we call Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. We perform experiments in a competitive four player game (Pommerman) and a path-finding maze game. We find that this weak form of guidance provides significant gains in sample complexity with a stark advantage in sparse reward environments. In some cases, standard RL did not yield any improvement while Backplay reached success rates greater than 50 generalized to unseen initial conditions in the same amount of training time. Additionally, we see that agents trained via Backplay can learn policies superior to those of the original demonstration.

READ FULL TEXT
research
12/08/2018

Learning Montezuma's Revenge from a Single Demonstration

We propose a new method for learning from a single demonstration to solv...
research
05/17/2023

Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum

While reinforcement learning (RL) has achieved great success in acquirin...
research
02/09/2022

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the spars...
research
03/23/2019

TTR-Based Rewards for Reinforcement Learning with Implicit Model Priors

Model-free reinforcement learning (RL) provides an attractive approach f...
research
06/16/2018

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

Model-free Reinforcement Learning (RL) offers an attractive approach to ...
research
02/23/2021

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

Pommerman is a hybrid cooperative/adversarial multi-agent environment, w...
research
12/01/2019

Automated curriculum generation for Policy Gradients from Demonstrations

In this paper, we present a technique that improves the process of train...

Please sign up or login with your details

Forgot password? Click here to reset