Thompson Sampling for Gaussian Entropic Risk Bandits

05/14/2021
by   Ming Liang Ang, et al.
0

The multi-armed bandit (MAB) problem is a ubiquitous decision-making problem that exemplifies exploration-exploitation tradeoff. Standard formulations exclude risk in decision making. Risknotably complicates the basic reward-maximising objectives, in part because there is no universally agreed definition of it. In this paper, we consider an entropic risk (ER) measure and explore the performance of a Thompson sampling-based algorithm ERTS under this risk measure by providing regret bounds for ERTS and corresponding instance dependent lower bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
research
04/17/2019

X-Armed Bandits: Optimizing Quantiles and Other Risks

We propose and analyze StoROO, an algorithm for risk optimization on sto...
research
08/25/2021

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and simplifies the analysis of risk-averse...
research
04/02/2021

Blind Exploration and Exploitation of Stochastic Experts

We present blind exploration and exploitation (BEE) algorithms for ident...
research
06/12/2023

A Distribution Optimization Framework for Confidence Bounds of Risk Measures

We present a distribution optimization framework that significantly impr...
research
01/09/2019

Robust and Adaptive Planning under Model Uncertainty

Planning under model uncertainty is a fundamental problem across many ap...
research
07/28/2020

A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach

Testing is an important part of tackling the COVID-19 pandemic. Availabi...

Please sign up or login with your details

Forgot password? Click here to reset