Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

08/04/2022
by   Jingwei Ji, et al.
0

Motivated by practical considerations in machine learning for financial decision-making, such as risk-aversion and large action space, we initiate the study of risk-aware linear bandits. Specifically, we consider regret minimization under the mean-variance measure when facing a set of actions whose rewards can be expressed as linear functions of (initially) unknown parameters. Driven by the variance-minimizing G-optimal design, we propose the Risk-Aware Explore-then-Commit (RISE) algorithm and the Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their regret upper bounds to show that, by leveraging the linear structure, the algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in a synthetic smart order routing setup. Our results show that both RISE and RISE++ can outperform the competing methods, especially in complex decision-making scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

Risk-aware linear bandits with convex loss

In decision-making problems such as the multi-armed bandit, an agent lea...
research
02/01/2020

Thompson Sampling Algorithms for Mean-Variance Bandits

The multi-armed bandit (MAB) problem is a classical learning task that e...
research
05/26/2022

Variance-Aware Sparse Linear Bandits

It is well-known that the worst-case minimax regret for sparse linear ba...
research
11/16/2020

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
research
02/28/2022

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

We consider learning a stochastic bandit model, where the reward functio...
research
12/12/2022

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...
research
04/17/2019

X-Armed Bandits: Optimizing Quantiles and Other Risks

We propose and analyze StoROO, an algorithm for risk optimization on sto...

Please sign up or login with your details

Forgot password? Click here to reset