Thompson Sampling for Linearly Constrained Bandits

04/20/2020
by   Vidit Saxena, et al.
0

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson Sampling (TS) heuristic have recently been proposed. However, finite-time analysis of constrained TS is challenging; as a result, only O(√(T)) bounds on the cumulative reward loss (i.e., the regret) are available. In this paper, we describe LinConTS, a TS-based algorithm for bandits that place a linear constraint on the probability of earning a reward in every round. We show that for LinConTS, the regret as well as the cumulative constraint violations are upper bounded by O(log T). We develop a proof technique that relies on careful analysis of the dual problem and combine it with recent theoretical work on unconstrained TS. Through numerical experiments on two real-world datasets, we demonstrate that LinConTS outperforms an asymptotically optimal upper confidence bound (UCB) scheme in terms of simultaneously minimizing the regret and the violation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

Stochastic Bandits with Linear Constraints

We study a constrained contextual linear bandit setting, where the goal ...
research
04/24/2020

Fast Thompson Sampling Algorithm with Cumulative Oversampling: Application to Budgeted Influence Maximization

We propose a cumulative oversampling (CO) technique for Thompson Samplin...
research
03/29/2022

On Kernelized Multi-Armed Bandits with Constraints

We study a stochastic bandit problem with a general unknown reward funct...
research
02/10/2021

An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits

This paper considers stochastic linear bandits with general constraints....
research
12/13/2021

Safe Linear Leveling Bandits

Multi-armed bandits (MAB) are extensively studied in various settings wh...
research
01/08/2013

Linear Bandits in High Dimension and Recommendation Systems

A large number of online services provide automated recommendations to h...
research
09/28/2018

Efficient Linear Bandits through Matrix Sketching

We prove that two popular linear contextual bandit algorithms, OFUL and ...

Please sign up or login with your details

Forgot password? Click here to reset