Fast Policy Learning through Imitation and Reinforcement

05/26/2018
by   Ching-An Cheng, et al.
0

Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to quickly learn policies. However, if the expert is suboptimal, IL can yield policies with inferior performance compared to reinforcement learning (RL). In this paper, we aim to provide an algorithm that combines the best aspects of RL and IL. We accomplish this by formulating several popular RL and IL algorithms in a common mirror descent framework, showing that these algorithms can be viewed as a variation on a single approach. We then propose LOKI, a strategy for policy learning that first performs a small but random number of IL iterations before switching to a policy gradient RL method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient from scratch. Finally, we evaluate the performance of LOKI experimentally in several simulated environments.

READ FULL TEXT
research
01/15/2019

Transfer Learning for Prosthetics Using Imitation Learning

In this paper, We Apply Reinforcement learning (RL) techniques to train ...
research
07/01/2020

Policy Improvement from Multiple Experts

Despite its promise, reinforcement learning's real-world adoption has be...
research
04/01/2019

Guided Meta-Policy Search

Reinforcement learning (RL) algorithms have demonstrated promising resul...
research
02/16/2023

Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

It is well known that Reinforcement Learning (RL) can be formulated as a...
research
01/31/2020

Preventing Imitation Learning with Adversarial Policy Ensembles

Imitation learning can reproduce policies by observing experts, which po...
research
07/11/2019

Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

We present Imitation-Projected Policy Gradient (IPPG), an algorithmic fr...
research
11/29/2020

Hybrid Imitation Learning for Real-Time Service Restoration in Resilient Distribution Systems

Self-healing capability is one of the most critical factors for a resili...

Please sign up or login with your details

Forgot password? Click here to reset