Semiparametric Best Arm Identification with Contextual Information

09/15/2022
by   Masahiro Kato, et al.
2

We study best-arm identification with a fixed budget and contextual (covariate) information in stochastic multi-armed bandit problems. In each round, after observing contextual information, we choose a treatment arm using past observations and current context. Our goal is to identify the best treatment arm, a treatment arm with the maximal expected reward marginalized over the contextual distribution, with a minimal probability of misidentification. First, we derive semiparametric lower bounds for this problem, where we regard the gaps between the expected rewards of the best and suboptimal treatment arms as parameters of interest, and all other parameters, such as the expected rewards conditioned on contexts, as the nuisance parameters. We then develop the "Contextual RS-AIPW strategy," which consists of the random sampling (RS) rule tracking a target allocation ratio and the recommendation rule using the augmented inverse probability weighting (AIPW) estimator. Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semiparametric lower bound when the budget goes to infinity, and the gaps converge to zero.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2022

Best Arm Identification with a Fixed Budget under a Small Gap

We consider the fixed-budget best arm identification problem in the mult...
research
06/26/2021

The Role of Contextual Information in Best Arm Identification

We study the best-arm identification problem with fixed confidence when ...
research
02/06/2023

Asymptotically Minimax Optimal Fixed-Budget Best Arm Identification for Expected Simple Regret Minimization

We investigate fixed-budget best arm identification (BAI) for expected s...
research
10/14/2022

Federated Best Arm Identification with Heterogeneous Clients

We study best arm identification in a federated multi-armed bandit setti...
research
06/03/2019

Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards

Classical multi-armed bandit problems use the expected value of an arm a...
research
05/08/2021

Learning to Detect an Odd Restless Markov Arm with a Trembling Hand

This paper studies the problem of finding an anomalous arm in a multi-ar...
research
01/06/2022

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Controlling antenna tilts in cellular networks is imperative to reach an...

Please sign up or login with your details

Forgot password? Click here to reset