A General Approach to Multi-Armed Bandits Under Risk Criteria

06/04/2018
by   Asaf Cassel, et al.
0

Different risk-related criteria have received recent interest in learning problems, where typically each case is treated in a customized manner. In this paper we provide a more systematic approach to analyzing such risk criteria within a stochastic multi-armed bandit (MAB) formulation. We identify a set of general conditions that yield a simple characterization of the oracle rule (which serves as the regret benchmark), and facilitate the design of upper confidence bound (UCB) learning policies. The conditions are derived from problem primitives, primarily focusing on the relation between the arm reward distributions and the (risk criteria) performance metric. Among other things, the work highlights some (possibly non-intuitive) subtleties that differentiate various criteria in conjunction with statistical properties of the arms. Our main findings are illustrated on several widely used objectives such as conditional value-at-risk, mean-variance, Sharpe-ratio, and more.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2014

Generalized Risk-Aversion in Stochastic Multi-Armed Bandits

We consider the problem of minimizing the regret in stochastic multi-arm...
research
09/26/2020

Near-Optimal MNL Bandits Under Risk Criteria

We study MNL bandits, which is a variant of the traditional multi-armed ...
research
12/02/2021

Risk-Aware Algorithms for Combinatorial Semi-Bandits

In this paper, we study the stochastic combinatorial multi-armed bandit ...
research
06/17/2020

Constrained regret minimization for multi-criterion multi-armed bandits

We consider a stochastic multi-armed bandit setting and study the proble...
research
04/30/2019

Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

In this paper, we study multi-armed bandit problems in explore-then-comm...
research
02/02/2019

On the bias, risk and consistency of sample means in multi-armed bandits

In the classic stochastic multi-armed bandit problem, it is well known t...
research
09/09/2022

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

In this paper, we consider a risk-averse multi-armed bandit (MAB) proble...

Please sign up or login with your details

Forgot password? Click here to reset