Approximate Top-m Arm Identification with Heterogeneous Reward Variances

04/11/2022

∙

We study the effect of reward variance heterogeneity in the approximate top-m arm identification setting. In this setting, the reward for the i-th arm follows a σ^2_i-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify m arms with the largest means within error ϵ out of the n arms, with probability at least 1-δ. We show that the worst-case sample complexity of this problem is Θ( ∑_i =1^n σ_i^2/ϵ^2ln1/δ + ∑_i ∈ G^mσ_i^2/ϵ^2ln(m) + ∑_j ∈ G^lσ_j^2/ϵ^2Ent(σ^2_G^r) ), where G^m, G^l, G^r are certain specific subsets of the overall arm set {1, 2, …, n}, and Ent(·) is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.

READ FULL TEXT

Approximate Top-m Arm Identification with Heterogeneous Reward Variances

Sign in with Google

Consider DeepAI Pro