Statistical Inference with M-Estimators on Bandit Data

04/29/2021
by   Kelly W. Zhang, et al.
10

Bandit algorithms are increasingly used in real world sequential decision making problems, from online advertising to mobile health. As a result, there are more datasets collected using bandit algorithms and with that an increased desire to be able to use these datasets to answer scientific questions like: Did one type of ad increase the click-through rate more or lead to more purchases? In which contexts is a mobile health intervention effective? However, it has been shown that classical statistical approaches, like those based on the ordinary least squares estimator, fail to provide reliable confidence intervals when used with bandit data. Recently methods have been developed to conduct statistical inference using simple models fit to data collected with multi-armed bandits. However there is a lack of general methods for conducting statistical inference using more complex models. In this work, we develop theory justifying the use of M-estimation (Van der Vaart, 2000), traditionally used with i.i.d data, to provide inferential methods for a large class of estimators – including least squares and maximum likelihood estimators – but now with data collected with (contextual) bandit algorithms. To do this we generalize the use of adaptive weights pioneered by Hadad et al. (2019) and Deshpande et al. (2018). Specifically, in settings in which the data is collected via a (contextual) bandit algorithm, we prove that certain adaptively weighted M-estimators are uniformly asymptotically normal and demonstrate empirically that we can use their asymptotic distribution to construct reliable confidence regions for a variety of inferential targets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2020

Inference for Batched Bandits

As bandit algorithms are increasingly utilized in scientific studies, th...
research
02/14/2022

Statistical Inference After Adaptive Sampling in Non-Markovian Environments

There is a great desire to use adaptive sampling methods, such as reinfo...
research
06/01/2021

Post-Contextual-Bandit Inference

Contextual bandit algorithms are increasingly replacing non-adaptive A/B...
research
12/21/2022

Online Statistical Inference for Matrix Contextual Bandit

Contextual bandit has been widely used for sequential decision-making ba...
research
10/14/2020

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting

Online decision-making problem requires us to make a sequence of decisio...
research
10/19/2022

Anytime-valid off-policy inference for contextual bandits

Contextual bandit algorithms are ubiquitous tools for active sequential ...
research
07/03/2023

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Multi armed bandit (MAB) algorithms have been increasingly used to compl...

Please sign up or login with your details

Forgot password? Click here to reset