Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

07/16/2021
by   Toshinori Kitamura, et al.
0

The recent booming of entropy-regularized literature reveals that Kullback-Leibler (KL) regularization brings advantages to Reinforcement Learning (RL) algorithms by canceling out errors under mild assumptions. However, existing analyses focus on fixed regularization with a constant weighting coefficient and have not considered the case where the coefficient is allowed to change dynamically. In this paper, we study the dynamic coefficient scheme and present the first asymptotic error bound. Based on the dynamic coefficient error bound, we propose an effective scheme to tune the coefficient according to the magnitude of error in favor of more robust learning. On top of this development, we propose a novel algorithm: Geometric Value Iteration (GVI) that features a dynamic error-aware KL coefficient design aiming to mitigate the impact of errors on the performance. Our experiments demonstrate that GVI can effectively exploit the trade-off between learning speed and robustness over uniform averaging of constant KL coefficient. The combination of GVI and deep networks shows stable learning behavior even in the absence of a target network where algorithms with a constant KL coefficient would greatly oscillate or even fail to converge.

READ FULL TEXT
research
03/31/2020

Leverage the Average: an Analysis of Regularization in RL

Building upon the formalism of regularized Markov decision processes, we...
research
07/13/2021

Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning

In this paper, we propose cautious policy programming (CPP), a novel val...
research
01/27/2023

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

Many policy optimization approaches in reinforcement learning incorporat...
research
05/16/2022

q-Munchausen Reinforcement Learning

The recently successful Munchausen Reinforcement Learning (M-RL) feature...
research
02/11/2021

Optimization Issues in KL-Constrained Approximate Policy Iteration

Many reinforcement learning algorithms can be seen as versions of approx...
research
05/16/2022

Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

Maximum Tsallis entropy (MTE) framework in reinforcement learning has ga...
research
04/27/2022

KL-Mat : Fair Recommender System via Information Geometry

Recommender system has intrinsic problems such as sparsity and fairness....

Please sign up or login with your details

Forgot password? Click here to reset