Best Arm Identification in Bandits with Limited Precision Sampling

05/10/2023
by   Kota Srinivas Reddy, et al.
0

We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In particular, at each sampling epoch, the learner selects a box, which in turn causes an arm to get pulled as per a box-specific probability distribution. The pulled arm and its instantaneous reward are revealed to the learner, whose goal is to find the best arm by minimising the expected stopping time, subject to an upper bound on the error probability. We present an asymptotic lower bound on the expected stopping time, which holds as the error probability vanishes. We show that the optimal allocation suggested by the lower bound is, in general, non-unique and therefore challenging to track. We propose a modified tracking-based algorithm to handle non-unique optimal allocations, and demonstrate that it is asymptotically optimal. We also present non-asymptotic lower and upper bounds on the stopping time in the simpler setting when the arms accessible from one box do not overlap with those of others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Learning to Detect an Odd Markov Arm

A multi-armed bandit with finitely many arms is studied when each arm is...
research
10/28/2021

Selective Sampling for Online Best-arm Identification

This work considers the problem of selective-sampling for best-arm ident...
research
08/29/2023

Pure Exploration under Mediators' Feedback

Stochastic multi-armed bandits are a sequential-decision-making framewor...
research
11/17/2016

Unimodal Thompson Sampling for Graph-Structured Arms

We study, to the best of our knowledge, the first Bayesian algorithm for...
research
09/16/2020

Thompson Sampling for Unsupervised Sequential Selection

Thompson Sampling has generated significant interest due to its better e...
research
06/22/2023

Pure Exploration in Bandits with Linear Constraints

We address the problem of identifying the optimal policy with a fixed co...
research
10/11/2022

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

A Top Two sampling rule for bandit identification is a method which sele...

Please sign up or login with your details

Forgot password? Click here to reset