Risk-aware linear bandits with convex loss

09/15/2022
by   Patrick Saux, et al.
0

In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.

READ FULL TEXT
research
06/24/2022

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for ...
research
08/04/2022

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Motivated by practical considerations in machine learning for financial ...
research
05/12/2022

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio ...
research
02/24/2021

Continuous Mean-Covariance Bandits

Existing risk-aware multi-armed bandit models typically focus on risk me...
research
11/25/2022

On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks

In the problem of (binary) contextual bandits with knapsacks (CBwK), the...
research
09/06/2022

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

We consider risk-averse learning in repeated unknown games where the goa...
research
04/17/2019

X-Armed Bandits: Optimizing Quantiles and Other Risks

We propose and analyze StoROO, an algorithm for risk optimization on sto...

Please sign up or login with your details

Forgot password? Click here to reset