Covariance Adaptive Best Arm Identification

06/05/2023
by   El Mehdi Saad, et al.
0

We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest mean reward with a probability of at least 1 – δ, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the assumption of independent arms distributions, we propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously. This framework allows the learner to estimate the covariance among the arms distributions, enabling a more efficient identification of the best arm. The relaxed setting we propose is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations in the outcomes. We introduce new algorithms that adapt to the unknown covariance of the arms and demonstrate through theoretical guarantees that substantial improvement can be achieved over the standard setting. Additionally, we provide new lower bounds for the relaxed setting and present numerical simulations that support their theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Best-Arm Identification in Correlated Multi-Armed Bandits

In this paper we consider the problem of best-arm identification in mult...
research
05/19/2017

Practical Algorithms for Best-K Identification in Multi-Armed Bandits

In the Best-K identification problem (Best-K-Arm), we are given N stocha...
research
06/13/2022

Top Two Algorithms Revisited

Top Two algorithms arose as an adaptation of Thompson sampling to best a...
research
12/14/2020

Best Arm Identification in Graphical Bilinear Bandits

We introduce a new graphical bilinear bandit problem where a learner (or...
research
10/14/2019

Thresholding Bandit Problem with Both Duels and Pulls

The Thresholding Bandit Problem (TBP) aims to find the set of arms with ...
research
02/08/2019

Correlated bandits or: How to minimize mean-squared error online

While the objective in traditional multi-armed bandit problems is to fin...
research
10/06/2021

Learning the Optimal Recommendation from Explorative Users

We propose a new problem setting to study the sequential interactions be...

Please sign up or login with your details

Forgot password? Click here to reset