A Theory of Regularized Markov Decision Processes

01/31/2019
by   Matthieu Geist, et al.
0

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or on Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2019

On Connections between Constrained Optimization and Reinforcement Learning

Dynamic Programming (DP) provides standard algorithms to solve Markov De...
research
05/22/2017

A unified view of entropy-regularized Markov decision processes

We propose a general framework for entropy-regularized average-reward re...
research
07/02/2019

Modified Actor-Critics

Robot Learning, from a control point of view, often involves continuous ...
research
07/06/2019

Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP)...
research
10/16/2012

Sparse Q-learning with Mirror Descent

This paper explores a new framework for reinforcement learning based on ...
research
05/22/2023

A Convex Optimization Framework for Regularized Geodesic Distances

We propose a general convex optimization problem for computing regulariz...
research
03/31/2020

Leverage the Average: an Analysis of Regularization in RL

Building upon the formalism of regularized Markov decision processes, we...

Please sign up or login with your details

Forgot password? Click here to reset