Bandits with many optimal arms

03/23/2021
by   Rianne de Heide, et al.
0

We consider a stochastic bandit problem with a possibly infinite number of arms. We write p^* for the proportion of optimal arms and Δ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters T (the budget), p^* and Δ. For the objective of minimizing the cumulative regret, we provide a lower bound of order Ω(log(T)/(p^*Δ)) and a UCB-style algorithm with matching upper bound up to a factor of log(1/Δ). Our algorithm needs p^* to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p^* in this setting is impossible. For best-arm identification we also provide a lower bound of order Ω(exp(-cTΔ^2p^*)) on the probability of outputting a sub-optimal arm where c>0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(1/Δ) in the exponential, and that does not need p^* or Δ as parameter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2021

Optimal Simple Regret in Bayesian Best Arm Identification

We consider Bayesian best arm identification in the multi-armed bandit p...
research
05/29/2021

Understanding Bandits with Graph Feedback

The bandit problem with graph feedback, proposed in [Mannor and Shamir, ...
research
03/28/2018

A Better Resource Allocation Algorithm with Semi-Bandit Feedback

We study a sequential resource allocation problem between a fixed number...
research
09/05/2019

Optimal UCB Adjustments for Large Arm Sizes

The regret lower bound of Lai and Robbins (1985), the gold standard for ...
research
05/22/2019

An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule

We present a provably optimal differentially private algorithm for the s...
research
04/11/2022

Approximate Top-m Arm Identification with Heterogeneous Reward Variances

We study the effect of reward variance heterogeneity in the approximate ...
research
10/11/2022

The Typical Behavior of Bandit Algorithms

We establish strong laws of large numbers and central limit theorems for...

Please sign up or login with your details

Forgot password? Click here to reset