Memory-Constrained Policy Optimization

04/20/2022
by   Hung Le, et al.
3

We introduce a new constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose to form a second trust region through the construction of another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial in case the old policy performs badly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed as Memory-Constrained Policy Optimization (MCPO), is examined on a diverse suite of environments including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.

READ FULL TEXT

page 7

page 19

page 22

research
03/09/2020

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
03/19/2019

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep re...
research
08/04/2020

Faded-Experience Trust Region Policy Optimization for Model-Free Power Allocation in Interference Channel

Policy gradient reinforcement learning techniques enable an agent to dir...
research
05/21/2020

Novel Policy Seeking with Constrained Optimization

In this work, we address the problem of learning to seek novel policies ...
research
06/25/2023

Provably Convergent Policy Optimization via Metric-aware Trust Region Methods

Trust-region methods based on Kullback-Leibler divergence are pervasivel...
research
06/13/2019

Jacobian Policy Optimizations

Recently, natural policy gradient algorithms gained widespread recogniti...
research
07/29/2019

Hindsight Trust Region Policy Optimization

As reinforcement learning continues to drive machine intelligence beyond...

Please sign up or login with your details

Forgot password? Click here to reset