Asymptotic Randomised Control with applications to bandits

10/14/2020
by   Samuel N. Cohen, et al.
0

We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process, obtained numerically by solving a fixed point problem, which can be interpreted as explicitly balancing an exploration-exploitation trade-off. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably with other approaches to correlated multi-armed bandits.

READ FULL TEXT
research
04/12/2017

Value Directed Exploration in Multi-Armed Bandits with Structured Priors

Multi-armed bandits are a quintessential machine learning problem requir...
research
08/17/2020

Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems

The multi-armed bandit problem is a classical decision-making problem wh...
research
07/20/2016

On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits

The Knowledge Gradient (KG) policy was originally proposed for online ra...
research
07/13/2019

Parameterized Exploration

We introduce Parameterized Exploration (PE), a simple family of methods ...
research
02/08/2021

Correlated Bandits for Dynamic Pricing via the ARC algorithm

The Asymptotic Randomised Control (ARC) algorithm provides a rigorous ap...
research
07/06/2023

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

In this paper, we consider a general observation model for restless mult...
research
03/18/2022

Approximate Function Evaluation via Multi-Armed Bandits

We study the problem of estimating the value of a known smooth function ...

Please sign up or login with your details

Forgot password? Click here to reset