Easy Monotonic Policy Iteration

02/29/2016
by   Joshua Achiam, et al.
0

A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or Q-function may fail to improve performance---or worse, actually cause the policy performance to degrade. Prior work has addressed this for policy iteration by deriving tight policy improvement bounds; by optimizing the lower bound on policy improvement, a better policy is guaranteed. However, existing approaches suffer from bounds that are hard to optimize in practice because they include sup norm terms which cannot be efficiently estimated or differentiated. In this work, we derive a better policy improvement bound where the sup norm of the policy divergence has been replaced with an average divergence; this leads to an algorithm, Easy Monotonic Policy Iteration, that generates sequences of policies with guaranteed non-decreasing returns and is easy to implement in a sample-based framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2017

On- and Off-Policy Monotonic Policy Improvement

Monotonic policy improvement and off-policy learning are two main desira...
research
08/25/2020

Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

This paper aims to establish an entropy-regularized value-based reinforc...
research
07/13/2021

Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning

In this paper, we propose cautious policy programming (CPP), a novel val...
research
10/15/2022

When to Update Your Model: Constrained Model-based Reinforcement Learning

Designing and analyzing model-based RL (MBRL) algorithms with guaranteed...
research
02/10/2018

Beyond the One Step Greedy Approach in Reinforcement Learning

The famous Policy Iteration algorithm alternates between policy improvem...
research
10/07/2020

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a rew...
research
05/21/2018

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

Multiple-step lookahead policies have demonstrated high empirical compet...

Please sign up or login with your details

Forgot password? Click here to reset