On the complexity of All ε-Best Arms Identification

02/13/2022
by   Aymen Al Marjani, et al.
0

We consider the problem introduced by <cit.> of identifying all the ε-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. In the fixed confidence setting, we give a lower bound on the number of samples required by any algorithm that returns the set of ε-good arms with a failure probability less than some risk level δ. This bound writes as T_ε^*(μ)log(1/δ), where T_ε^*(μ) is a characteristic time that depends on the vector of mean rewards μ and the accuracy parameter ε. We also provide an efficient numerical method to solve the convex max-min program that defines the characteristic time. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by <cit.>. Using this method, we propose a Track-and-Stop algorithm that identifies the set of ε-good arms w.h.p and enjoys asymptotic optimality (when δ goes to zero) in terms of the expected sample complexity. Finally, using numerical simulations, we demonstrate our algorithm's advantage over state-of-the-art methods, even for moderate values of the risk parameter.

READ FULL TEXT
research
01/24/2019

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

We consider the problem of identifying any k out of the best m arms in a...
research
12/27/2013

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for id...
research
08/23/2015

The Max K-Armed Bandit: A PAC Lower Bound and tighter Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...
research
12/23/2015

The Max K-Armed Bandit: PAC Lower Bounds and Efficient Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...
research
11/14/2018

Sample complexity of partition identification using multi-armed bandits

Given a vector of probability distributions, or arms, each of which can ...
research
06/22/2023

Pure Exploration in Bandits with Linear Constraints

We address the problem of identifying the optimal policy with a fixed co...
research
02/03/2023

An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem

This work studies the pure-exploration setting for the convex hull feasi...

Please sign up or login with your details

Forgot password? Click here to reset