Dynamic Bandits with an Auto-Regressive Temporal Structure

10/28/2022
by   Qinyi Chen, et al.
0

Multi-armed bandit (MAB) problems are mainly studied under two extreme settings known as stochastic and adversarial. These two settings, however, do not capture realistic environments such as search engines and marketing and advertising, in which rewards stochastically change in time. Motivated by that, we introduce and study a dynamic MAB problem with stochastic temporal structure, where the expected reward of each arm is governed by an auto-regressive (AR) model. Due to the dynamic nature of the rewards, simple "explore and commit" policies fail, as all arms have to be explored continuously over time. We formalize this by characterizing a per-round regret lower bound, where the regret is measured against a strong (dynamic) benchmark. We then present an algorithm whose per-round regret almost matches our regret lower bound. Our algorithm relies on two mechanisms: (i) alternating between recently pulled arms and unpulled arms with potential, and (ii) restarting. These mechanisms enable the algorithm to dynamically adapt to changes and discard irrelevant past information at a suitable rate. In numerical studies, we further demonstrate the strength of our algorithm under different types of non-stationary settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2019

Multi-Armed Bandits with Correlated Arms

We consider a multi-armed bandit framework where the rewards obtained by...
research
05/20/2014

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is ...
research
05/13/2014

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each ...
research
10/25/2021

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits wit...
research
03/25/2018

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corrupti...
research
11/27/2018

Rotting bandits are no harder than stochastic ones

In bandits, arms' distributions are stationary. This is often violated i...
research
06/28/2021

Dynamic Planning and Learning under Recovering Rewards

Motivated by emerging applications such as live-streaming e-commerce, pr...

Please sign up or login with your details

Forgot password? Click here to reset