On the bias, risk and consistency of sample means in multi-armed bandits

02/02/2019
by   Jaehyeok Shin, et al.
0

In the classic stochastic multi-armed bandit problem, it is well known that the sample mean for a chosen arm is a biased estimator of its true mean. In this paper, we characterize the effect of four sources of this selection bias: adaptively sampling an arm at each step, adaptively stopping the data collection, adaptively choosing which arm to target for mean estimation, and adaptively rewinding the clock to focus on the sample mean of the chosen arm at some past time. We qualitatively characterize data collecting strategies for which the bias induced by adaptive sampling and stopping can be negative or positive. For general parametric and nonparametric classes of distributions with varying tail decays, we provide bounds on the risk (expected Bregman divergence between the sample and true mean) that hold for arbitrary rules for sampling, stopping, choosing and rewinding. These risk bounds are minimax optimal up to log factors, and imply tight bounds on the selection bias and sufficient conditions for their consistency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

The bias of the sample mean in multi-armed bandits can be positive or negative

It is well known that in stochastic multi-armed bandits (MAB), the sampl...
research
09/21/2023

Optimal Conditional Inference in Adaptive Experiments

We study batched bandit experiments and consider the problem of inferenc...
research
02/25/2021

Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits

To balance exploration and exploitation, multi-armed bandit algorithms n...
research
02/19/2020

On conditional versus marginal bias in multi-armed bandits

The bias of the sample means of the arms in multi-armed bandits is an im...
research
08/17/2020

Optimal Best-Arm Identification Methods for Tail-Risk Measures

Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tai...
research
02/08/2019

Correlated bandits or: How to minimize mean-squared error online

While the objective in traditional multi-armed bandit problems is to fin...
research
06/04/2018

A General Approach to Multi-Armed Bandits Under Risk Criteria

Different risk-related criteria have received recent interest in learnin...

Please sign up or login with your details

Forgot password? Click here to reset