Pure Exploration with Multiple Correct Answers

02/09/2019
by   Rémy Degenne, et al.
0

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2016

Optimal Best Arm Identification with Fixed Confidence

We give a complete characterization of the complexity of best-arm identi...
research
10/17/2017

Good Arm Identification via Bandit Feedback

In this paper, we consider and discuss a new stochastic multi-armed band...
research
06/09/2022

Choosing Answers in ε-Best-Answer Identification for Linear Bandits

In pure-exploration problems, information is gathered sequentially to an...
research
02/15/2016

Maximin Action Identification: A New Bandit Framework for Games

We study an original problem of pure exploration in a strategic bandit m...
research
06/23/2020

Combinatorial Pure Exploration of Dueling Bandit

In this paper, we study combinatorial pure exploration for dueling bandi...
research
02/12/2022

Stochastic Strategic Patient Buyers: Revenue maximization using posted prices

We consider a seller faced with buyers which have the ability to delay t...
research
11/13/2017

Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

We analyze the sample complexity of the thresholding bandit problem, wit...

Please sign up or login with your details

Forgot password? Click here to reset