Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning

03/07/2022
by   Yingjie Fei, et al.
0

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2021

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

We study risk-sensitive reinforcement learning (RL) based on the entropi...
research
07/02/2021

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

We provide improved gap-dependent regret bounds for reinforcement learni...
research
06/04/2023

Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures

We study finite episodic Markov decision processes incorporating dynamic...
research
10/25/2022

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

We study the regret guarantee for risk-sensitive reinforcement learning ...
research
06/03/2019

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

In an effort to better understand the different ways in which the discou...
research
07/01/2021

Gap-Dependent Bounds for Two-Player Markov Games

As one of the most popular methods in the field of reinforcement learnin...
research
05/25/2022

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

We propose a new learning framework that captures the tiered structure o...

Please sign up or login with your details

Forgot password? Click here to reset