Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning

07/13/2021
by   Lingwei Zhu, et al.
0

In this paper, we propose cautious policy programming (CPP), a novel value-based reinforcement learning (RL) algorithm that can ensure monotonic policy improvement during learning. Based on the nature of entropy-regularized RL, we derive a new entropy regularization-aware lower bound of policy improvement that only requires estimating the expected policy advantage function. CPP leverages this lower bound as a criterion for adjusting the degree of a policy update for alleviating policy oscillation. Different from similar algorithms that are mostly theory-oriented, we also propose a novel interpolation scheme that makes CPP better scale in high dimensional control problems. We demonstrate that the proposed algorithm can trade o? performance and stability in both didactic classic control problems and challenging high-dimensional Atari games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2020

Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

This paper aims to establish an entropy-regularized value-based reinforc...
research
05/16/2022

q-Munchausen Reinforcement Learning

The recently successful Munchausen Reinforcement Learning (M-RL) feature...
research
10/15/2022

When to Update Your Model: Constrained Model-based Reinforcement Learning

Designing and analyzing model-based RL (MBRL) algorithms with guaranteed...
research
07/16/2021

Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

The recent booming of entropy-regularized literature reveals that Kullba...
research
10/16/2022

Entropy Regularized Reinforcement Learning with Cascading Networks

Deep Reinforcement Learning (Deep RL) has had incredible achievements on...
research
02/29/2016

Easy Monotonic Policy Iteration

A key problem in reinforcement learning for control with general functio...
research
07/24/2023

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

As with any machine learning problem with limited data, effective offlin...

Please sign up or login with your details

Forgot password? Click here to reset