Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

07/06/2023

∙

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an O(√(α^-(H+1)(d^2H^4+dH^6)K)) regret, where α is the risk level, d is the dimension of state-action features, H is the length of each episode, and K is the number of episodes. We also establish a matching lower bound Ω(√(α^-(H-1)d^2K)) to validate the optimality of ICVaR-L with respect to d and K. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an O(√(α^-(H+1)DH^4K)) regret, where D is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

READ FULL TEXT

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Sign in with Google

Consider DeepAI Pro