Best-of-K Bandits

03/09/2016
by   Max Simchowitz, et al.
0

This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution. The objective is to identify the subset that achieves the highest expected reward with high probability using as few queries as possible. We present distribution-dependent lower bounds based on a particular construction which force a learner to consider all N-choose-K subsets, and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm. Nevertheless, we present evidence that exhaustive search may be avoided for certain, favorable distributions because the influence of high-order order correlations may be dominated by lower order statistics. Finally, we present an algorithm and analysis for independent arms, which mitigates the surprising non-trivial information occlusion that occurs due to only observing the max in the subset. This may inform strategies for more general dependent measures, and we complement these result with independent-arm lower bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2022

Max-Quantile Grouped Infinite-Arm Bandits

In this paper, we consider a bandit problem in which there are a number ...
research
06/16/2020

Corralling Stochastic Bandit Algorithms

We study the problem of corralling stochastic bandit algorithms, that is...
research
03/04/2019

Stochastic Online Learning with Probabilistic Graph Feedback

We consider a problem of stochastic online learning with general probabi...
research
09/30/2022

On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

We lay the foundations of a non-parametric theory of best-arm identifica...
research
05/24/2016

Refined Lower Bounds for Adversarial Bandits

We provide new lower bounds on the regret that must be suffered by adver...
research
06/18/2021

Problem Dependent View on Structured Thresholding Bandit Problems

We investigate the problem dependent regime in the stochastic Thresholdi...
research
11/17/2021

Max-Min Grouped Bandits

In this paper, we introduce a multi-armed bandit problem termed max-min ...

Please sign up or login with your details

Forgot password? Click here to reset