Best Arm Identification with Safety Constraints

11/23/2021
by   Zhenlin Wang, et al.
0

The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2022

Active Learning with Safety Constraints

Active learning methods have shown great promise in reducing the number ...
research
09/15/2023

Price of Safety in Linear Best Arm Identification

We introduce the safe best-arm identification framework with linear feed...
research
05/23/2023

Disincentivizing Polarization in Social Networks

On social networks, algorithmic personalization drives users into filter...
research
11/02/2019

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

Recent advances in contextual bandit optimization and reinforcement lear...
research
04/01/2022

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

We investigate a natural but surprisingly unstudied approach to the mult...
research
02/22/2018

Collaboratively Learning the Best Option, Using Bounded Memory

We consider multi-armed bandit problems in social groups wherein each in...
research
03/14/2023

Best arm identification in rare events

We consider the best arm identification problem in the stochastic multi-...

Please sign up or login with your details

Forgot password? Click here to reset