A Doubly Optimistic Strategy for Safe Linear Bandits

09/27/2022
by   Tianrui Chen, et al.
0

We propose a doubly optimistic strategy for the safe-linear-bandit problem, DOSLB. The safe linear bandit problem is to optimise an unknown linear reward whilst satisfying unknown round-wise safety constraints on actions, using stochastic bandit feedback of reward and safety-risks of actions. In contrast to prior work on aggregated resource constraints, our formulation explicitly demands control on roundwise safety risks. Unlike existing optimistic-pessimistic paradigms for safe bandits, DOSLB exercises supreme optimism, using optimistic estimates of reward and safety scores to select actions. Yet, and surprisingly, we show that DOSLB rarely takes risky actions, and obtains Õ(d √(T)) regret, where our notion of regret accounts for both inefficiency and lack of safety of actions. Specialising to polytopal domains, we first notably show that the √(T)-regret bound cannot be improved even with large gaps, and then identify a slackened notion of regret for which we show tight instance-dependent O(log^2 T) bounds. We further argue that in such domains, the number of times an overly risky action is played is also bounded as O(log^2T).

READ FULL TEXT
research
08/16/2019

Linear Stochastic Bandits Under Safety Constraints

Bandit algorithms have various application in safety-critical systems, w...
research
09/30/2020

Stage-wise Conservative Linear Bandits

We study stage-wise conservative linear stochastic bandits: an instance ...
research
04/01/2022

Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

We investigate a natural but surprisingly unstudied approach to the mult...
research
12/13/2021

Safe Linear Leveling Bandits

Multi-armed bandits (MAB) are extensively studied in various settings wh...
research
11/06/2019

Safe Linear Thompson Sampling

The design and performance analysis of bandit algorithms in the presence...
research
02/27/2023

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

Consider a decision-maker that can pick one out of K actions to control ...
research
06/26/2019

Orthogonal Projection in Linear Bandits

The expected reward in a linear stochastic bandit model is an unknown li...

Please sign up or login with your details

Forgot password? Click here to reset