Log In Sign Up

Risk-Constrained Thompson Sampling for CVaR Bandits

by   Joel Q. L. Chang, et al.

The multi-armed bandit (MAB) problem is a ubiquitous decision-making problem that exemplifies the exploration-exploitation tradeoff. Standard formulations exclude risk in decision making. Risk notably complicates the basic reward-maximising objective, in part because there is no universally agreed definition of it. In this paper, we consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR). We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure. We provide comprehensive comparisons between our regret bounds with state-of-the-art L/UCB-based algorithms in comparable settings and demonstrate their clear improvement in performance. We also include numerical simulations to empirically verify that CVaR-TS outperforms other L/UCB-based algorithms.


page 1

page 2

page 3

page 4


Thompson Sampling for Gaussian Entropic Risk Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...

Thompson Sampling Algorithms for Mean-Variance Bandits

The multi-armed bandit (MAB) problem is a classical learning task that e...

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and simplifies the analysis of risk-averse...

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio ...

Blind Exploration and Exploitation of Stochastic Experts

We present blind exploration and exploitation (BEE) algorithms for ident...

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Motivated by practical considerations in machine learning for financial ...

Robust and Adaptive Planning under Model Uncertainty

Planning under model uncertainty is a fundamental problem across many ap...