Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

10/08/2017
by   Isaac J. Sledge, et al.
0

In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.

READ FULL TEXT

page 14

page 15

research
04/03/2019

Batched Multi-armed Bandits Problem

In this paper, we study the multi-armed bandit problem in the batched se...
research
10/24/2020

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

In this paper, we consider stochastic multi-armed bandits (MABs) with he...
research
02/05/2018

Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion

Reinforcement learning in environments with many action-state pairs is c...
research
09/16/2016

Exploration Potential

We introduce exploration potential, a quantity that measures how much a ...
research
10/21/2019

Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access

In this paper, we study the uncoordinated spectrum access problem using ...
research
09/29/2021

Batched Bandits with Crowd Externalities

In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be u...
research
03/01/2018

The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates

In this paper we propose and explore the k-Nearest Neighbour UCB algorit...

Please sign up or login with your details

Forgot password? Click here to reset