What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

03/19/2018
by   Lilian Besson, et al.
0

An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any non-anytime algorithm is the "Doubling Trick". In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences of growing horizons (geometric and exponential) to generalize previously known results that certain doubling tricks can be used to conserve certain regret bounds. In a broad setting, we prove that a geometric doubling trick can be used to conserve (minimax) bounds in R_T = O(√(T)) but cannot conserve (distribution-dependent) bounds in R_T = O( T). We give insights as to why exponential doubling tricks may be better, as they conserve bounds in R_T = O( T), and are close to conserving bounds in R_T = O(√(T)).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

Batched Multi-armed Bandits Problem

In this paper, we study the multi-armed bandit problem in the batched se...
research
03/01/2018

The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates

In this paper we propose and explore the k-Nearest Neighbour UCB algorit...
research
11/18/2015

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I analyse the frequentist regret of the famous Gittins index strategy fo...
research
11/29/2021

Online Fair Revenue Maximizing Cake Division with Non-Contiguous Pieces in Adversarial Bandits

The classic cake-cutting problem provides a model for addressing the fai...
research
04/06/2023

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

In this work, we derive sharp non-asymptotic deviation bounds for weight...
research
09/15/2021

Estimation of Warfarin Dosage with Reinforcement Learning

In this paper, it has attempted to use Reinforcement learning to model t...
research
06/19/2022

Nested bandits

In many online decision processes, the optimizing agent is called to cho...

Please sign up or login with your details

Forgot password? Click here to reset