Variance-Dependent Best Arm Identification

06/19/2021
by   Pinyan Lu, et al.
0

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of n arms indexed from 1 to n, each arm i is associated with an unknown reward distribution supported on [0,1] with mean θ_i and variance σ_i^2. Assume θ_1 > θ_2 ≥⋯≥θ_n. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called grouped median elimination. The proposed algorithm guarantees to output the best arm with probability (1-δ) and uses at most O (∑_i = 1^n (σ_i^2/Δ_i^2 + 1/Δ_i)(lnδ^-1 + lnlnΔ_i^-1)) samples, where Δ_i (i ≥ 2) denotes the reward gap between arm i and the best arm and we define Δ_1 = Δ_2. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra ln n factor on the best arm compared with the state-of-the-art. We further show that Ω( ∑_i = 1^n ( σ_i^2/Δ_i^2 + 1/Δ_i) lnδ^-1) samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/10/2021

Best-Arm Identification in Correlated Multi-Armed Bandits

In this paper we consider the problem of best-arm identification in mult...
06/20/2019

Stochastic One-Sided Full-Information Bandit

In this paper, we study the stochastic version of the one-sided full inf...
11/23/2021

Best Arm Identification with Safety Constraints

The best arm identification problem in the multi-armed bandit setting is...
02/01/2021

Doubly Robust Thompson Sampling for linear payoffs

A challenging aspect of the bandit problem is that a stochastic reward i...
10/06/2021

Learning the Optimal Recommendation from Explorative Users

We propose a new problem setting to study the sequential interactions be...
02/11/2016

Network of Bandits insure Privacy of end-users

In order to distribute the best arm identification task as close as poss...
02/18/2020

Intelligent and Reconfigurable Architecture for KL Divergence Based Online Machine Learning Algorithm

Online machine learning (OML) algorithms do not need any training phase ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.