Memory Augmented Policy Optimization for Program Synthesis with Generalization

07/06/2018
by   Chen Liang, et al.
Google
Mosaix
Tel Aviv University
6

This paper presents Memory Augmented Policy Optimization (MAPO): a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted sum of two terms: an expectation over a memory of trajectories with high rewards, and a separate expectation over the trajectories outside the memory. We propose 3 techniques to make an efficient training algorithm for MAPO: (1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover high reward trajectories. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with a sparse reward. We evaluate MAPO on weakly supervised program synthesis from natural language with an emphasis on generalization. On the WikiTableQuestions benchmark we improve the state-of-the-art by 2.5 benchmark, MAPO achieves an accuracy of 74.9 outperforming several strong baselines with full supervision. Our code is open sourced at https://github.com/crazydonkey200/neural-symbolic-machines.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/29/2020

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...
07/29/2019

Hindsight Trust Region Policy Optimization

As reinforcement learning continues to drive machine intelligence beyond...
03/13/2018

Learning to Explore with Meta-Policy Gradient

The performance of off-policy learning, including deep Q-learning and de...
05/20/2022

Sigmoidally Preconditioned Off-policy Learning:a new exploration method for reinforcement learning

One of the major difficulties of reinforcement learning is learning from...
11/27/2020

TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation

Score function-based natural language generation (NLG) approaches such a...
01/18/2023

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

We propose discriminative reward co-training (DIRECT) as an extension to...
07/18/2023

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Adaptive gradient-based optimizers, particularly Adam, have left their m...

Code Repositories

neural-symbolic-machines

Neural Symbolic Machines is a framework to integrate neural networks and symbolic representations using reinforcement learning, with applications in program synthesis and semantic parsing.


view repo

pytorch_neural_symbolic_machines

A PyTorch Implementation of Neural Symbolic Machines by Liang et al. (2018)


view repo

wassp

Official code for AAAI'20 paper "Merging Weak and Active Supervision for Semantic Parsing"


view repo

Please sign up or login with your details

Forgot password? Click here to reset