Safe Linear Thompson Sampling

11/06/2019
by   Ahmadreza Moradipari, et al.
0

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional linear safety constraints that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this problem and show a frequentist regret of order O (d^3/2log^1/2d · T^1/2log^3/2T), which remarkably matches the results provided by [Abeille et al., 2017] for the standard linear TS algorithm in the absence of safety constraints. We compare the performance of our algorithm with a UCB-based safe algorithm and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2020

Regret Bounds for Safe Gaussian Process Bandit Optimization

Many applications require a learner to make sequential decisions given u...
research
06/08/2020

Learning under Invariable Bayesian Safety

A recent body of work addresses safety constraints in explore-and-exploi...
research
09/27/2022

A Doubly Optimistic Strategy for Safe Linear Bandits

We propose a doubly optimistic strategy for the safe-linear-bandit probl...
research
02/13/2023

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

In this paper, we investigate a novel safe reinforcement learning proble...
research
09/30/2020

Stage-wise Conservative Linear Bandits

We study stage-wise conservative linear stochastic bandits: an instance ...
research
05/01/2023

The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

We consider a safe optimization problem with bandit feedback in which an...
research
09/20/2022

Rethink the Adversarial Scenario-based Safety Testing of Robots: the Comparability and Optimal Aggressiveness

This paper studies the class of scenario-based safety testing algorithms...

Please sign up or login with your details

Forgot password? Click here to reset