Leverage the Average: an Analysis of Regularization in RL

03/31/2020
by   Nino Vieillard, et al.
7

Building upon the formalism of regularized Markov decision processes, we study the effect of Kullback-Leibler (KL) and entropy regularization in reinforcement learning. Through an equivalent formulation of the related approximate dynamic programming (ADP) scheme, we show that a KL penalty amounts to averaging q-values. This equivalence allows drawing connections between a priori disconnected methods from the literature, and proving that a KL regularization indeed leads to averaging errors made at each iteration of value function update. With the proposed theoretical analysis, we also study the interplay between KL and entropy regularization. When the considered ADP scheme is combined with neural-network-based stochastic approximations, the equivalence is lost, which suggests a number of different ways to do regularization. Because this goes beyond what we can analyse theoretically, we extensively study this aspect empirically.

READ FULL TEXT

page 7

page 8

page 34

page 35

page 37

page 38

page 39

research
07/16/2021

Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

The recent booming of entropy-regularized literature reveals that Kullba...
research
05/16/2022

q-Munchausen Reinforcement Learning

The recently successful Munchausen Reinforcement Learning (M-RL) feature...
research
10/17/2021

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

We study entropy-regularized constrained Markov decision processes (CMDP...
research
08/16/2021

Implicitly Regularized RL with Implicit Q-Values

The Q-function is a central quantity in many Reinforcement Learning (RL)...
research
01/31/2019

A Theory of Regularized Markov Decision Processes

Many recent successful (deep) reinforcement learning algorithms make use...
research
12/08/2021

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

We present ShinRL, an open-source library specialized for the evaluation...
research
02/21/2022

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

Entropy regularized Markov decision processes have been widely used in r...

Please sign up or login with your details

Forgot password? Click here to reset