Memory Augmented Policy Optimization for Program Synthesis with Generalization

by   Chen Liang, et al.
Tel Aviv University

This paper presents Memory Augmented Policy Optimization (MAPO): a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted sum of two terms: an expectation over a memory of trajectories with high rewards, and a separate expectation over the trajectories outside the memory. We propose 3 techniques to make an efficient training algorithm for MAPO: (1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover high reward trajectories. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with a sparse reward. We evaluate MAPO on weakly supervised program synthesis from natural language with an emphasis on generalization. On the WikiTableQuestions benchmark we improve the state-of-the-art by 2.5 benchmark, MAPO achieves an accuracy of 74.9 outperforming several strong baselines with full supervision. Our code is open sourced at


page 1

page 2

page 3

page 4


Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...

Hindsight Trust Region Policy Optimization

As reinforcement learning continues to drive machine intelligence beyond...

Learning to Explore with Meta-Policy Gradient

The performance of off-policy learning, including deep Q-learning and de...

Sigmoidally Preconditioned Off-policy Learning:a new exploration method for reinforcement learning

One of the major difficulties of reinforcement learning is learning from...

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

Score function-based natural language generation (NLG) approaches such a...

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

We propose discriminative reward co-training (DIRECT) as an extension to...

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Adaptive gradient-based optimizers, particularly Adam, have left their m...

Code Repositories


Neural Symbolic Machines is a framework to integrate neural networks and symbolic representations using reinforcement learning, with applications in program synthesis and semantic parsing.

view repo


A PyTorch Implementation of Neural Symbolic Machines by Liang et al. (2018)

view repo


Official code for AAAI'20 paper "Merging Weak and Active Supervision for Semantic Parsing"

view repo

Please sign up or login with your details

Forgot password? Click here to reset