Neural Program Synthesis with Priority Queue Training

01/10/2018
by   Daniel A. Abolafia, et al.
0

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2018

A Faster External Memory Priority Queue with DecreaseKeys

A priority queue is a fundamental data structure that maintains a dynami...
research
07/14/2020

Programming by Rewards

We formalize and study “programming by rewards” (PBR), a new approach fo...
research
08/31/2021

Learning to Synthesize Programs as Interpretable and Generalizable Policies

Recently, deep reinforcement learning (DRL) methods have achieved impres...
research
06/08/2018

Program Synthesis Through Reinforcement Learning Guided Tree Search

Program Synthesis is the task of generating a program from a provided sp...
research
04/27/2021

Inductive Program Synthesis over Noisy Datasets using Abstraction Refinement Based Optimization

We present a new synthesis algorithm to solve program synthesis over noi...
research
12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...
research
01/30/2023

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Aiming to produce reinforcement learning (RL) policies that are human-in...

Please sign up or login with your details

Forgot password? Click here to reset