Imitation-Projected Programmatic Reinforcement Learning

07/11/2019
by   Abhinav Verma, et al.
0

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Such programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for programmatic policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for PROPEL and empirically evaluate the approach in three continuous control domains. The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies.

READ FULL TEXT
research
07/11/2019

Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

We present Imitation-Projected Policy Gradient (IPPG), an algorithmic fr...
research
07/01/2018

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Deep reinforcement learning has led to several recent breakthroughs, tho...
research
01/31/2020

Preventing Imitation Learning with Adversarial Policy Ensembles

Imitation learning can reproduce policies by observing experts, which po...
research
01/18/2022

Programmatic Policy Extraction by Iterative Local Search

Reinforcement learning policies are often represented by neural networks...
research
07/03/2019

Co-training for Policy Learning

We study the problem of learning sequential decision-making policies in ...
research
09/22/2021

Imitation Learning of Stabilizing Policies for Nonlinear Systems

There has been a recent interest in imitation learning methods that are ...
research
12/29/2017

Smoothed Dual Embedding Control

We revisit the Bellman optimality equation with Nesterov's smoothing tec...

Please sign up or login with your details

Forgot password? Click here to reset