Deep Conservative Policy Iteration

06/24/2019
by   Nino Vieillard, et al.
0

Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical guarantees, and inspired approaches in deep Reinforcement Learning (RL). However, CPI itself has rarely been implemented, never with neural networks, and only experimented on toy problems. In this paper, we show how CPI can be practically combined with deep RL with discrete actions. We also introduce adaptive mixture rates inspired by the theory. We experiment thoroughly the resulting algorithm on the simple Cartpole problem, and validate the proposed method on a representative subset of Atari games. Overall, this work suggests that revisiting classic ADP may lead to improved and more stable deep RL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2021

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

Reinforcement Learning (RL) has been able to solve hard problems such as...
research
10/18/2019

On Connections between Constrained Optimization and Reinforcement Learning

Dynamic Programming (DP) provides standard algorithms to solve Markov De...
research
10/28/2020

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning

Despite empirical success, the theory of reinforcement learning (RL) wit...
research
02/27/2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning

Off-policy multi-step reinforcement learning algorithms consist of conse...
research
09/06/2019

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

Trust region policy optimization (TRPO) is a popular and empirically suc...
research
10/16/2022

Entropy Regularized Reinforcement Learning with Cascading Networks

Deep Reinforcement Learning (Deep RL) has had incredible achievements on...
research
10/27/2021

A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems

In this paper, we establish a subgame perfect equilibrium reinforcement ...

Please sign up or login with your details

Forgot password? Click here to reset