Break your Bandit Routine with LSD Rewards: a Last Switch Dependent Analysis of Satiation and Seasonality

10/22/2021
by   Pierre Laforgue, et al.
5

Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2021

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits wit...
research
01/29/2023

Smooth Non-Stationary Bandits

In many applications of online decision making, the environment is non-s...
research
05/29/2022

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

We propose an algorithm for non-stationary kernel bandits that does not ...
research
02/10/2020

Combinatorial Semi-Bandit in the Non-Stationary Environment

In this paper, we investigate the non-stationary combinatorial semi-band...
research
06/28/2021

Dynamic Planning and Learning under Recovering Rewards

Motivated by emerging applications such as live-streaming e-commerce, pr...
research
10/22/2019

Restless Hidden Markov Bandits with Linear Rewards

This paper presents an algorithm and regret analysis for the restless hi...
research
06/28/2022

Dynamic Memory for Interpretable Sequential Optimisation

Real-world applications of reinforcement learning for recommendation and...

Please sign up or login with your details

Forgot password? Click here to reset