Variance-Dependent Best Arm Identification

06/19/2021
by   Pinyan Lu, et al.
0

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of n arms indexed from 1 to n, each arm i is associated with an unknown reward distribution supported on [0,1] with mean θ_i and variance σ_i^2. Assume θ_1 > θ_2 ≥⋯≥θ_n. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called grouped median elimination. The proposed algorithm guarantees to output the best arm with probability (1-δ) and uses at most O (∑_i = 1^n (σ_i^2/Δ_i^2 + 1/Δ_i)(lnδ^-1 + lnlnΔ_i^-1)) samples, where Δ_i (i ≥ 2) denotes the reward gap between arm i and the best arm and we define Δ_1 = Δ_2. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra ln n factor on the best arm compared with the state-of-the-art. We further show that Ω( ∑_i = 1^n ( σ_i^2/Δ_i^2 + 1/Δ_i) lnδ^-1) samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Best-Arm Identification in Correlated Multi-Armed Bandits

In this paper we consider the problem of best-arm identification in mult...
research
05/19/2017

Practical Algorithms for Best-K Identification in Multi-Armed Bandits

In the Best-K identification problem (Best-K-Arm), we are given N stocha...
research
08/19/2022

Almost Cost-Free Communication in Federated Best Arm Identification

We study the problem of best arm identification in a federated learning ...
research
02/01/2021

Doubly Robust Thompson Sampling for linear payoffs

A challenging aspect of the bandit problem is that a stochastic reward i...
research
10/06/2021

Learning the Optimal Recommendation from Explorative Users

We propose a new problem setting to study the sequential interactions be...
research
02/11/2016

Network of Bandits insure Privacy of end-users

In order to distribute the best arm identification task as close as poss...
research
06/20/2019

Stochastic One-Sided Full-Information Bandit

In this paper, we study the stochastic version of the one-sided full inf...

Please sign up or login with your details

Forgot password? Click here to reset