An Optimal Elimination Algorithm for Learning a Best Arm

by   Avinatan Hassidim, et al.
Hebrew University of Jerusalem
Harvard University
Bar-Ilan University

We consider the classic problem of (ϵ,δ)-PAC learning a best arm where the goal is to identify with confidence 1-δ an arm whose mean is an ϵ-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for (ϵ,δ)-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to exactly the optimal sample complexity of (ϵ,δ)-learning the mean of n arms separately and we complement this result with a conditional matching lower bound. More specifically:


page 1

page 2

page 3

page 4


Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed ...

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

We study the problem of best arm identification in linearly parameterise...

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Learning paradigms based purely on offline data as well as those based s...

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find...

The Max K-Armed Bandit: A PAC Lower Bound and tighter Algorithms

We consider the Max K-Armed Bandit problem, where a learning agent is fa...

Sparse Dueling Bandits

The dueling bandit problem is a variation of the classical multi-armed b...

Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

We analyze the sample complexity of the thresholding bandit problem, wit...

Please sign up or login with your details

Forgot password? Click here to reset