Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

01/31/2019
by   Kyungjae Lee, et al.
0

In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems, we propose a model-free actor-critic RL method using Tsallis entropy maximization. We evaluate the regularization effect of the Tsallis entropy with various values of entropic indices and show that the entropic index controls the exploration tendency of the proposed method. For a different type of RL problems, we find that a different value of the entropic index is desirable. The proposed method is evaluated using the MuJoCo simulator and achieves the state-of-the-art performance.

READ FULL TEXT
research
03/02/2019

A Unified Framework for Regularized Reinforcement Learning

We propose and study a general framework for regularized Markov decision...
research
06/19/2021

A Max-Min Entropy Framework for Reinforcement Learning

In this paper, we propose a max-min entropy framework for reinforcement ...
research
09/19/2017

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

In this paper, a sparse Markov decision process (MDP) with novel causal ...
research
05/09/2021

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Building on the breakthrough of reinforcement learning, this paper intro...
research
06/17/2020

Parameterized MDPs and Reinforcement Learning Problems – A Maximum Entropy Principle Based Framework

We present a framework to address a class of sequential decision making ...
research
10/05/2021

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Whittle index policy is a powerful tool to obtain asymptotically optimal...
research
06/07/2021

Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning

We introduce a mapping between Maximum Entropy Reinforcement Learning (M...

Please sign up or login with your details

Forgot password? Click here to reset