Last Switch Dependent Bandits with Monotone Payoff Functions

06/01/2023
by   Ayoub Foussoul, et al.
0

In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment. Examples include satiation, where consecutive plays of the same action lead to decreased performance, or deprivation, where the payoff of an action increases after an interval of inactivity. In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delay-dependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2022

Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

The stochastic multi-armed bandit setting has been recently studied in t...
research
02/16/2023

Linear Bandits with Memory: from Rotting to Rising

Nonstationary phenomena, such as satiation effects in recommendation, ar...
research
11/09/2017

Small-loss bounds for online learning with partial information

We consider the problem of adversarial (non-stochastic) online learning ...
research
12/06/2019

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple fram...
research
10/23/2020

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

Logistic Bandits have recently attracted substantial attention, by provi...
research
11/22/2018

Bandits with Temporal Stochastic Constraints

We study the effect of impairment on stochastic multi-armed bandits and ...
research
06/14/2023

Bandits with Replenishable Knapsacks: the Best of both Worlds

The bandits with knapsack (BwK) framework models online decision-making ...

Please sign up or login with your details

Forgot password? Click here to reset