Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

12/13/2021
by   Sergey Garbar, et al.
0

We consider the upper confidence bound strategy for Gaussian multi-armed bandits with known control horizon sizes N and build its limiting description with a system of stochastic differential equations and ordinary differential equations. Rewards for the arms are assumed to have unknown expected values and known variances. A set of Monte-Carlo simulations was performed for the case of close distributions of rewards, when mean rewards differ by the magnitude of order N^-1/2, as it yields the highest normalized regret, to verify the validity of the obtained description. The minimal size of the control horizon when the normalized regret is not noticeably larger than maximum possible was estimated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2021

Stochastic Multi-Armed Bandits with Control Variates

This paper studies a new variant of the stochastic multi-armed bandits p...
research
02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...
research
10/21/2019

Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access

In this paper, we study the uncoordinated spectrum access problem using ...
research
05/27/2022

Fairness and Welfare Quantification for Regret in Multi-Armed Bandits

We extend the notion of regret with a welfarist perspective. Focussing o...
research
11/18/2015

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I analyse the frequentist regret of the famous Gittins index strategy fo...
research
09/06/2022

Multi-Armed Bandits with Self-Information Rewards

This paper introduces the informational multi-armed bandit (IMAB) model ...
research
05/07/2021

Modeling of Spiral Structure in a Multi-Component Milky Way-Like Galaxy

Using recent observational data, we construct a set of multi-component e...

Please sign up or login with your details

Forgot password? Click here to reset