Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

07/23/2021
by   Junpei Komiyama, et al.
4

We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), which is a class of bandit algorithms that leverages adaptive windowing techniques from the data stream community. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independent interest in the data mining community. Furthermore, we conduct a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment where changes occur instantaneously and a gradual environment where changes occur progressively. We demonstrate that ADR-bandit has nearly optimal performance when the abrupt or global changes occur in a coordinated manner that we call global changes. We demonstrate that forced exploration is unnecessary when we restrict the interest to the global changes. Unlike the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in stationary environments as well as nonstationary environments with global changes. Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments.

READ FULL TEXT

page 4

page 22

research
03/07/2022

PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong...
research
02/18/2022

Adaptivity and Confounding in Multi-Armed Bandit Experiments

We explore a new model of bandit experiments where a potentially nonstat...
research
07/31/2017

Taming Non-stationary Bandits: A Bayesian Approach

We consider the multi armed bandit problem in non-stationary environment...
research
03/08/2021

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Restless Multi-Armed Bandits (RMABs) have been popularly used to model l...
research
03/13/2015

Interactive Restless Multi-armed Bandit Game and Swarm Intelligence Effect

We obtain the conditions for the emergence of the swarm intelligence eff...
research
06/30/2020

Forced-exploration free Strategies for Unimodal Bandits

We consider a multi-armed bandit problem specified by a set of Gaussian ...
research
10/13/2021

Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

Training data for machine translation (MT) is often sourced from a multi...

Please sign up or login with your details

Forgot password? Click here to reset