Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning

by   Yingjie Fei, et al.

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.


page 1

page 2

page 3

page 4


Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

We study risk-sensitive reinforcement learning (RL) based on the entropi...

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

We provide improved gap-dependent regret bounds for reinforcement learni...

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

We study risk-sensitive reinforcement learning in episodic Markov decisi...

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

In an effort to better understand the different ways in which the discou...

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

We propose a new learning framework that captures the tiered structure o...

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

In order to model risk aversion in reinforcement learning, an emerging l...