On Adaptive Estimation for Dynamic Bernoulli Bandits

12/08/2017
by   Xue Lu, et al.
0

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary counts, we focus on dynamic Bernoulli bandits. Standard methods like ϵ-Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track the changes in underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family of algorithms, which are adaptive versions of ϵ-Greedy, UCB, and Thompson sampling. These new methods are simple and easy to implement. Moreover, they do not require any prior knowledge about the data, which is important for real applications. We examine the new algorithms numerically in different scenarios and find out that the results show solid improvements of our algorithms in dynamic environments.

READ FULL TEXT
research
03/19/2022

Thompson Sampling on Asymmetric α-Stable Bandits

In algorithm optimization in reinforcement learning, how to deal with th...
research
01/17/2022

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

We study the non-stationary stochastic multi-armed bandit problem, where...
research
07/14/2020

Quantum exploration algorithms for multi-armed bandits

Identifying the best arm of a multi-armed bandit is a central problem in...
research
02/23/2017

Rotting Bandits

The Multi-Armed Bandits (MAB) framework highlights the tension between a...
research
08/05/2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

The multi-armed bandit problem forms the foundation for solving a wide r...
research
04/07/2012

UCB Algorithm for Exponential Distributions

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) ...
research
04/22/2020

Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D

In evolutionary computation, different reproduction operators have vario...

Please sign up or login with your details

Forgot password? Click here to reset