Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

07/11/2019
by   Abhinav Verma, et al.
3

We present Imitation-Projected Policy Gradient (IPPG), an algorithmic framework for learning policies that are parsimoniously represented in a structured programming language. Such programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for programmatic policies remains a challenge. IPPG, our response to this challenge, is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a "lift-and-project" perspective that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for IPPG, as well as an empirical evaluation in three continuous control domains. The experiments show that IPPG can significantly outperform state-of-the-art approaches for learning programmatic policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2019

Imitation-Projected Programmatic Reinforcement Learning

We study the problem of programmatic reinforcement learning, in which po...
research
01/31/2020

Preventing Imitation Learning with Adversarial Policy Ensembles

Imitation learning can reproduce policies by observing experts, which po...
research
06/24/2020

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

Generative adversarial imitation learning (GAIL) is a popular inverse re...
research
05/26/2018

Fast Policy Learning through Imitation and Reinforcement

Imitation learning (IL) consists of a set of tools that leverage expert ...
research
11/03/2017

Genetic Policy Optimization

Genetic algorithms have been widely used in many practical optimization ...
research
06/19/2019

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

Policy gradient (PG) methods are a widely used reinforcement learning me...
research
05/07/2019

Object Exchangeability in Reinforcement Learning: Extended Abstract

Although deep reinforcement learning has advanced significantly over the...

Please sign up or login with your details

Forgot password? Click here to reset