Learning to Synthesize Programs as Interpretable and Generalizable Policies

08/31/2021
by   Dweep Trivedi, et al.
11

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.

READ FULL TEXT

page 6

page 40

page 42

research
01/30/2023

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Aiming to produce reinforcement learning (RL) policies that are human-in...
research
03/09/2023

Hierarchical Neural Program Synthesis

Program synthesis aims to automatically construct human-readable program...
research
06/16/2019

MoËT: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees

Deep Reinforcement Learning (DRL) has led to many recent breakthroughs o...
research
01/10/2018

Neural Program Synthesis with Priority Queue Training

We consider the task of program synthesis in the presence of a reward fu...
research
04/06/2018

Programmatically Interpretable Reinforcement Learning

We study the problem of generating interpretable and verifiable policies...
research
04/24/2019

Neural Logic Reinforcement Learning

Deep reinforcement learning (DRL) has achieved significant breakthroughs...
research
07/16/2018

Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees

Deep Reinforcement Learning (DRL) has achieved impressive success in man...

Please sign up or login with your details

Forgot password? Click here to reset