Linear Bandits with Memory: from Rotting to Rising

02/16/2023
by   Giulia Clerici, et al.
0

Nonstationary phenomena, such as satiation effects in recommendation, are a common feature of sequential decision-making problems. While these phenomena have been mostly studied in the framework of bandits with finitely many arms, in many practically relevant cases linear bandits provide a more effective modeling choice. In this work, we introduce a general framework for the study of nonstationary linear bandits, where current rewards are influenced by the learner's past actions in a fixed-size window. In particular, our model includes stationary linear bandits as a special case. After showing that the best sequence of actions is NP-hard to compute in our model, we focus on cyclic policies and prove a regret bound for a variant of the OFUL algorithm that balances approximation and estimation errors. Our theoretical findings are supported by experiments (which also include misspecified settings) where our algorithm is seen to perform well against natural baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2018

Factored Bandits

We introduce the factored bandits model, which is a framework for learni...
research
07/01/2021

Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow

In membership/subscriber acquisition and retention, we sometimes need to...
research
06/01/2023

Last Switch Dependent Bandits with Monotone Payoff Functions

In a recent work, Laforgue et al. introduce the model of last switch dep...
research
02/02/2022

Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with K arms, where t...
research
03/09/2019

Linear Bandits with Feature Feedback

This paper explores a new form of the linear bandit problem in which the...
research
11/18/2019

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

The construction in the recent paper by Du et al. [2019] implies that se...
research
05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...

Please sign up or login with your details

Forgot password? Click here to reset