Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits

06/03/2023
by   Xiao-Yue Gong, et al.
0

We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability 1-δ, within ε of being among the top η-fraction of arms; this is a natural adaptation of the classical PAC guarantee for infinite action sets. We consider both the fixed confidence and fixed budget settings, aiming respectively for minimal expected and fixed sample complexity. For fixed confidence, we give an algorithm with expected sample complexity O(log (1/η)log (1/δ)/ηε^2). This is optimal except for the log (1/η) factor, and the δ-dependence closes a quadratic gap in the literature. For fixed budget, we show the asymptotically optimal sample complexity as δ→ 0 is c^-1log(1/δ)(loglog(1/δ))^2 to leading order. Equivalently, the optimal failure probability given exactly N samples decays as exp(-cN/log^2 N), up to a factor 1± o_N(1) inside the exponent. The constant c depends explicitly on the problem parameters (including the unknown arm distribution) through a certain Fisher information distance. Even the strictly super-linear dependence on log(1/δ) was not known and resolves a question of Grossman and Moshkovitz (FOCS 2016, SIAM Journal on Computing 2020).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed ...
research
02/24/2021

Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions

In this paper, we study the Combinatorial Pure Exploration problem with ...
research
10/28/2018

Exploring k out of Top ρ Fraction of Arms in Stochastic Bandits

This paper studies the problem of identifying any k distinct arms among ...
research
06/21/2020

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

This paper proposes near-optimal algorithms for the pure-exploration lin...
research
11/15/2018

Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs

This paper considers a multi-armed bandit game where the number of arms ...
research
02/16/2017

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

We propose a novel technique for analyzing adaptive sampling called the ...
research
10/16/2017

Fully adaptive algorithm for pure exploration in linear bandits

We propose the first fully-adaptive algorithm for pure exploration in li...

Please sign up or login with your details

Forgot password? Click here to reset