Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

05/13/2014
by   Omar Besbes, et al.
0

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret. Our analysis draws some connections between two rather disparate strands of literature: the adversarial and the stochastic MAB frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2018

BelMan: Bayesian Bandits on the Belief--Reward Manifold

We propose a generic, Bayesian, information geometric approach to the ex...
research
01/12/2016

Infomax strategies for an optimal balance between exploration and exploitation

Proper balance between exploitation and exploration is what makes good d...
research
08/05/2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

The multi-armed bandit problem forms the foundation for solving a wide r...
research
02/23/2017

Rotting Bandits

The Multi-Armed Bandits (MAB) framework highlights the tension between a...
research
10/23/2021

The Countable-armed Bandit with Vanishing Arms

We consider a bandit problem with countably many arms, partitioned into ...
research
10/28/2022

Dynamic Bandits with an Auto-Regressive Temporal Structure

Multi-armed bandit (MAB) problems are mainly studied under two extreme s...
research
05/26/2017

Combinatorial Multi-Armed Bandits with Filtered Feedback

Motivated by problems in search and detection we present a solution to a...

Please sign up or login with your details

Forgot password? Click here to reset