Linear Stochastic Bandits Under Safety Constraints

08/16/2019
by   Sanae Amani, et al.
0

Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe actions and must act conservatively in ensuring that her actions satisfy the safety constraint at all rounds (at least with high probability). For these bandits, we propose a new UCB-based algorithm called Safe-LUCB, which includes necessary modifications to respect safety constraints. The algorithm has two phases. During the pure exploration phase the learner chooses her actions at random from a restricted set of safe actions with the goal of learning a good approximation of the entire unknown safe set. Once this goal is achieved, the algorithm begins a safe exploration-exploitation phase where the learner gradually expands their estimate of the set of safe actions while controlling the growth of regret. We provide a general regret bound for the algorithm, as well as a problem dependent bound that is connected to the location of the optimal action within the safe set. We then propose a modified heuristic that exploits our problem dependent analysis to improve the regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2020

Regret Bounds for Safe Gaussian Process Bandit Optimization

Many applications require a learner to make sequential decisions given u...
research
09/27/2022

A Doubly Optimistic Strategy for Safe Linear Bandits

We propose a doubly optimistic strategy for the safe-linear-bandit probl...
research
11/21/2019

Safe Linear Stochastic Bandits

We introduce the safe linear stochastic bandit framework—a generalizatio...
research
08/29/2023

Exploiting Problem Geometry in Safe Linear Bandits

The safe linear bandit problem is a version of the classic linear bandit...
research
11/14/2021

Safe Online Convex Optimization with Unknown Linear Safety Constraints

We study the problem of safe online convex optimization, where the actio...
research
09/15/2023

Price of Safety in Linear Best Arm Identification

We introduce the safe best-arm identification framework with linear feed...
research
12/13/2021

Safe Linear Leveling Bandits

Multi-armed bandits (MAB) are extensively studied in various settings wh...

Please sign up or login with your details

Forgot password? Click here to reset