Conservative Contextual Linear Bandits

11/19/2016
by   Abbas Kazerouni, et al.
0

Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields including personalized ad recommendation in online marketing. We formulate a notion of safety for this class of algorithms. We develop a safe contextual linear bandit algorithm, called conservative linear UCB (CLUCB), that simultaneously minimizes its regret and satisfies the safety constraint, i.e., maintains its performance above a fixed percentage of the performance of a baseline strategy, uniformly over time. We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint. We empirically show that our algorithm is safe and validate our theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Stochastic Conservative Contextual Linear Bandits

Many physical systems have underlying safety considerations that require...
research
04/17/2021

Conservative Contextual Combinatorial Cascading Bandit

Conservative mechanism is a desirable property in decision-making proble...
research
05/04/2019

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is a class of sequential decision making proble...
research
09/30/2020

Stage-wise Conservative Linear Bandits

We study stage-wise conservative linear stochastic bandits: an instance ...
research
02/08/2020

Improved Algorithms for Conservative Exploration in Bandits

In many fields such as digital marketing, healthcare, finance, and robot...
research
12/13/2021

Safe Linear Leveling Bandits

Multi-armed bandits (MAB) are extensively studied in various settings wh...
research
01/02/2023

Local Differential Privacy for Sequential Decision Making in a Changing Environment

We study the problem of preserving privacy while still providing high ut...

Please sign up or login with your details

Forgot password? Click here to reset