Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits

11/01/2022
by   Yifei Wang, et al.
0

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on identifying the best arm, i.e., estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings, achieving optimal sample complexities. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. The matching lower bounds utilize several different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2018

Best Arm Identification for Contaminated Bandits

We propose the Contaminated Best Arm Identification variant of the Multi...
research
06/15/2023

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Learning paradigms based purely on offline data as well as those based s...
research
07/14/2020

Quantum exploration algorithms for multi-armed bandits

Identifying the best arm of a multi-armed bandit is a central problem in...
research
10/04/2022

Max-Quantile Grouped Infinite-Arm Bandits

In this paper, we consider a bandit problem in which there are a number ...
research
07/15/2021

A unified framework for bandit multiple testing

In bandit multiple hypothesis testing, each arm corresponds to a differe...
research
01/11/2021

Learning with Comparison Feedback: Online Estimation of Sample Statistics

We study an online version of the noisy binary search problem where feed...
research
05/10/2014

Functional Bandits

We introduce the functional bandit problem, where the objective is to fi...

Please sign up or login with your details

Forgot password? Click here to reset