Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

02/07/2023
by   Kaiwen Wang, et al.
0

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance τ. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is Ω(√(τ^-1AK)), where A is the number of actions and K is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of Ω(√(τ^-1SAK)) (with normalized cumulative rewards), where S is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of O(√(τ^-1SAK)) under a continuity assumption and in general attains a near-optimal regret of O(τ^-1√(SAK)), which is minimax-optimal for constant τ. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

The optimized certainty equivalent (OCE) is a family of risk measures th...
research
05/25/2022

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

We propose a new learning framework that captures the tiered structure o...
research
07/06/2023

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

Risk-sensitive reinforcement learning (RL) aims to optimize policies tha...
research
12/21/2021

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

Policy optimization methods are one of the most widely used classes of R...
research
07/25/2023

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL)...
research
10/08/2020

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision proc...
research
02/14/2019

Optimal disclosure risk assessment

Protection against disclosure is a legal and ethical obligation for agen...

Please sign up or login with your details

Forgot password? Click here to reset