Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

07/06/2023
by   Yu Chen, et al.
0

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an O(√(α^-(H+1)(d^2H^4+dH^6)K)) regret, where α is the risk level, d is the dimension of state-action features, H is the length of each episode, and K is the number of episodes. We also establish a matching lower bound Ω(√(α^-(H-1)d^2K)) to validate the optimality of ICVaR-L with respect to d and K. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an O(√(α^-(H+1)DH^4K)) regret, where D is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Risk-Sensitive Reinforcement Learning: Iterated CVaR and the Worst Path

In this paper, we study a novel episodic risk-sensitive Reinforcement Le...
research
02/07/2023

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

In this paper, we study risk-sensitive Reinforcement Learning (RL), focu...
research
02/22/2023

Provably Efficient Reinforcement Learning via Surprise Bound

Value function approximation is important in modern reinforcement learni...
research
05/23/2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

We study human-in-the-loop reinforcement learning (RL) with trajectory p...
research
02/16/2022

Branching Reinforcement Learning

In this paper, we propose a novel Branching Reinforcement Learning (Bran...
research
03/25/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

This paper considers batch Reinforcement Learning (RL) with general valu...
research
12/28/2021

Exponential Family Model-Based Reinforcement Learning via Score Matching

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-...

Please sign up or login with your details

Forgot password? Click here to reset