A unified view of entropy-regularized Markov decision processes

05/22/2017
by   Gergely Neu, et al.
0

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.

READ FULL TEXT
research
03/02/2019

A Unified Framework for Regularized Reinforcement Learning

We propose and study a general framework for regularized Markov decision...
research
10/21/2020

Logistic Q-Learning

We propose a new reinforcement learning algorithm derived from a regular...
research
07/13/2020

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Natural policy gradient (NPG) methods are among the most widely used pol...
research
12/22/2021

Entropy-Regularized Partially Observed Markov Decision Processes

We investigate partially observed Markov decision processes (POMDPs) wit...
research
01/18/2022

Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime

We study the global convergence of policy gradient for infinite-horizon,...
research
01/31/2019

A Theory of Regularized Markov Decision Processes

Many recent successful (deep) reinforcement learning algorithms make use...
research
02/21/2022

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

Entropy regularized Markov decision processes have been widely used in r...

Please sign up or login with your details

Forgot password? Click here to reset