Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

04/01/2022
by   Tianrui Chen, et al.
0

We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. Each arm is associated with an unknown law on safety risks and rewards, and the learner's goal is to maximise reward whilst not playing unsafe arms, as determined by a given threshold on the mean risk. We formulate a pseudo-regret for this setting that enforces this safety constraint in a per-round way by softly penalising any violation, regardless of the gain in reward due to the same. This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense. We describe doubly optimistic strategies for this scenario, which maintain optimistic indices for both safety risk and reward. We show that schema based on both frequentist and Bayesian indices satisfy tight gap-dependent logarithmic regret bounds, and further that these play unsafe arms only logarithmically many times in total. This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema, and probing the domains in which their use is appropriate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2022

A Doubly Optimistic Strategy for Safe Linear Bandits

We propose a doubly optimistic strategy for the safe-linear-bandit probl...
research
02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...
research
06/17/2020

Constrained regret minimization for multi-criterion multi-armed bandits

We consider a stochastic multi-armed bandit setting and study the proble...
research
11/21/2019

Observe Before Play: Multi-armed Bandit with Pre-observations

We consider the stochastic multi-armed bandit (MAB) problem in a setting...
research
05/27/2022

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

In this paper, we consider the setting of piecewise i.i.d. bandits under...
research
11/23/2021

Best Arm Identification with Safety Constraints

The best arm identification problem in the multi-armed bandit setting is...
research
09/09/2022

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

In this paper, we consider a risk-averse multi-armed bandit (MAB) proble...

Please sign up or login with your details

Forgot password? Click here to reset