DeepAI AI Chat
Log In Sign Up

Risk-aware Multi-armed Bandits Using Conditional Value-at-Risk

by   Ravi Kumar Kolla, et al.
Indian Institute Of Technology, Madras

Traditional multi-armed bandit problems are geared towards finding the arm with the highest expected value -- an objective that is risk-neutral. In several practical applications, e.g., finance, a risk-sensitive objective is to control the worst-case losses and Conditional Value-at-Risk (CVaR) is a popular risk measure for modelling the aforementioned objective. We consider the CVaR optimization problem in a best-arm identification framework under a fixed budget. First, we derive a novel two-sided concentration bound for a well-known CVaR estimator using empirical distribution function, assuming that the underlying distribution is unbounded, but either sub-Gaussian or light-tailed. This bound may be of independent interest. Second, we adapt the well-known successive rejects algorithm to incorporate a CVaR-based criterion and derive an upper-bound on the probability of incorrect identification of our proposed algorithm.


Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards

Classical multi-armed bandit problems use the expected value of an arm a...

Concentration bounds for empirical conditional value-at-risk: The unbounded case

In several real-world applications involving decision making under uncer...

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio ...

Risk averse non-stationary multi-armed bandits

This paper tackles the risk averse multi-armed bandits problem when incu...

Towards an efficient and risk aware strategy for guiding farmers in identifying best crop management

Identification of best performing fertilizer practices among a set of co...

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Traditional multi-armed bandit (MAB) formulations usually make certain a...

Quantile Bandits for Best Arms Identification with Concentration Inequalities

We consider a variant of the best arm identification task in stochastic ...