Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards

06/03/2019
by   Anmol Kagrecha, et al.
0

Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR) in a fixed budget best-arm identification framework. We allow the reward distributions to be unbounded or even heavy-tailed. For this problem, our goal is to devise algorithms that are entirely distribution oblivious, i.e., the algorithm is not aware of any information on the reward distributions, including bounds on the moments/tails, or the suboptimality gaps across arms. In this paper, we provide a class of such algorithms with provable upper bounds on the probability of incorrect identification. In the process, we develop a novel estimator for the CVaR of unbounded (including heavy-tailed) random variables and prove a concentration inequality for the same, which could be of independent interest. We also compare the error bounds for our distribution oblivious algorithms with those corresponding to standard non-oblivious algorithms. Finally, numerical experiments reveal that our algorithms perform competitively when compared with non-oblivious algorithms, suggesting that distribution obliviousness can be realised in practice without incurring a significant loss of performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2019

Risk-aware Multi-armed Bandits Using Conditional Value-at-Risk

Traditional multi-armed bandit problems are geared towards finding the a...
research
08/28/2020

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Traditional multi-armed bandit (MAB) formulations usually make certain a...
research
05/11/2021

Spectral risk-based learning using unbounded losses

In this work, we consider the setting of learning problems under a wide ...
research
08/17/2020

Optimal Best-Arm Identification Methods for Tail-Risk Measures

Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tai...
research
09/09/2022

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

In this paper, we consider a risk-averse multi-armed bandit (MAB) proble...
research
03/15/2019

A nonasymptotic law of iterated logarithm for robust online estimators

In this paper, we provide tight deviation bounds for M-estimators, which...
research
09/15/2022

Semiparametric Best Arm Identification with Contextual Information

We study best-arm identification with a fixed budget and contextual (cov...

Please sign up or login with your details

Forgot password? Click here to reset