Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning

05/16/2022
by   Lingwei Zhu, et al.
0

Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form Tsallis entropies, but also exhibits comparable performance to state-of-the-art maximum Shannon entropy algorithms.

READ FULL TEXT

page 5

page 7

page 8

research
05/16/2022

q-Munchausen Reinforcement Learning

The recently successful Munchausen Reinforcement Learning (M-RL) feature...
research
12/22/2021

Robust learning of data anomalies with analytically-solvable entropic outlier sparsification

Entropic Outlier Sparsification (EOS) is proposed as a robust computatio...
research
06/10/2018

Implicit Policy for Reinforcement Learning

We introduce Implicit Policy, a general class of expressive policies tha...
research
03/23/2022

Your Policy Regularizer is Secretly an Adversary

Policy regularization methods such as maximum entropy regularization are...
research
06/01/2021

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Policy-based reinforcement learning methods suffer from the policy colla...
research
03/18/2021

Maximum Entropy Reinforcement Learning with Mixture Policies

Mixture models are an expressive hypothesis class that can approximate a...
research
07/16/2021

Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning

The recent booming of entropy-regularized literature reveals that Kullba...

Please sign up or login with your details

Forgot password? Click here to reset