Improving Actor-Critic Reinforcement Learning via Hamiltonian Policy

03/22/2021
by   Duo Xu, et al.
0

Approximating optimal policies in reinforcement learning (RL) is often necessary in many real-world scenarios, which is termed as policy optimization. By viewing the reinforcement learning from the perspective of variational inference (VI), the policy network is trained to obtain the approximate posterior of actions given the optimality criteria. However, in practice, the policy optimization may lead to suboptimal policy estimates due to the amortization gap and insufficient exploration. In this work, inspired by the previous use of Hamiltonian Monte Carlo (HMC) in VI, we propose to integrate policy optimization with HMC. As such we choose evolving actions from the base policy according to HMC, which has two benefits: i) HMC can improve the policy distribution to better approximate the posterior and hence reduces the amortization gap; ii) HMC can also guide the exploration more to the regions with higher action values, enhancing the exploration efficiency. Instead of directly applying HMC into RL, we propose a new leapfrog operator to simulate the Hamiltonian dynamics. With comprehensive empirical experiments on continuous control baselines, including MuJoCo and PyBullet Roboschool, we show that the proposed approach is a data-efficient, and an easy-to-implement improvement over previous policy optimization methods. Besides, the proposed approach can also outperform previous methods on DeepMind Control Suite which has image-based high-dimensional observation space.

READ FULL TEXT
research
10/20/2020

Iterative Amortized Policy Optimization

Policy networks are a central feature of deep reinforcement learning (RL...
research
03/04/2023

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Uncertainty quantification has been extensively used as a means to achie...
research
07/25/2022

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

Model-based reinforcement learning (RL) achieves higher sample efficienc...
research
11/11/2020

Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient RL

Model-free reinforcement learning (RL), in particular Q-learning is wide...
research
06/25/2022

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

The class of deep deterministic off-policy algorithms is effectively app...
research
03/22/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Policy optimization methods remain a powerful workhorse in empirical Rei...
research
06/03/2019

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Off-policy reinforcement learning aims to leverage experience collected ...

Please sign up or login with your details

Forgot password? Click here to reset