Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

04/30/2019
by   Ali Yekkehkhany, et al.
0

In this paper, we study multi-armed bandit problems in explore-then-commit setting. In our proposed explore-then-commit setting, the goal is to identify the best arm after a pure experimentation (exploration) phase and exploit it once or for a given finite number of times. We identify that although the arm with the highest expected reward is the most desirable objective for infinite exploitations, it is not necessarily the one that is most probable to have the highest reward in a single or finite-time exploitations. Alternatively, we advocate the idea of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. Then, we propose two algorithms whose objectives are to select the arm that is most probable to reward the most. Using a new notion of finite-time exploitation regret, we find an upper bound for the minimum number of experiments before commitment, to guarantee an upper bound for the regret. As compared to existing risk-averse bandit algorithms, our algorithms do not rely on hyper-parameters, resulting in a more robust behavior in practice, which is verified by the numerical evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2019

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

In this paper, we propose a constant word (RAM model) algorithm for regr...
research
06/24/2022

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for ...
research
05/12/2022

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio ...
research
09/09/2022

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

In this paper, we consider a risk-averse multi-armed bandit (MAB) proble...
research
05/21/2015

Regulating Greed Over Time

In retail, there are predictable yet dramatic time-dependent patterns in...
research
06/04/2018

A General Approach to Multi-Armed Bandits Under Risk Criteria

Different risk-related criteria have received recent interest in learnin...
research
05/10/2014

Functional Bandits

We introduce the functional bandit problem, where the objective is to fi...

Please sign up or login with your details

Forgot password? Click here to reset