
Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem
We consider the problem of best arm identification with a fixed budget T...
read it

Near Optimal Adversarial Attack on UCB Bandits
We consider a stochastic multiarm bandit problem where rewards are subj...
read it

Understanding Bandits with Graph Feedback
The bandit problem with graph feedback, proposed in [Mannor and Shamir, ...
read it

A Better Resource Allocation Algorithm with SemiBandit Feedback
We study a sequential resource allocation problem between a fixed number...
read it

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting
We design and analyze CascadeBAI, an algorithm for finding the best set ...
read it

Optimal UCB Adjustments for Large Arm Sizes
The regret lower bound of Lai and Robbins (1985), the gold standard for ...
read it

An Optimal Private StochasticMAB Algorithm Based on an Optimal Private Stopping Rule
We present a provably optimal differentially private algorithm for the s...
read it
Bandits with many optimal arms
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p^* for the proportion of optimal arms and Δ for the minimal meangap between optimal and suboptimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the bestarm identification setting in terms of the problem parameters T (the budget), p^* and Δ. For the objective of minimizing the cumulative regret, we provide a lower bound of order Ω(log(T)/(p^*Δ)) and a UCBstyle algorithm with matching upper bound up to a factor of log(1/Δ). Our algorithm needs p^* to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p^* in this setting is impossible. For bestarm identification we also provide a lower bound of order Ω(exp(cTΔ^2p^*)) on the probability of outputting a suboptimal arm where c>0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(1/Δ) in the exponential, and that does not need p^* or Δ as parameter.
READ FULL TEXT
Comments
There are no comments yet.