On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

09/30/2022
by   Antoine Barrier, et al.
0

We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2016

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

We consider the problem of best arm identification with a fixed budget T...
research
07/16/2014

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

The stochastic multi-armed bandit model is a simple abstraction that has...
research
11/27/2022

Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget

We consider a constrained, pure exploration, stochastic multi-armed band...
research
05/09/2019

Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models

In this paper, we study sequential testing problems with overlapping hyp...
research
06/21/2019

Robustness of Dynamical Quantities of Interest via Goal-Oriented Information Theory

Variational-principle-based methods that relate expectations of a quanti...
research
08/28/2020

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits

Traditional multi-armed bandit (MAB) formulations usually make certain a...
research
03/09/2016

Best-of-K Bandits

This paper studies the Best-of-K Bandit game: At each time the player ch...

Please sign up or login with your details

Forgot password? Click here to reset