Why adaptively collected data have negative bias and how to correct for it

08/07/2017
by   Xinkun Nie, et al.
0

From scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic negative biases. As an example, consider an adaptive clinical trial where additional data points are more likely to be tested for treatments that show initial promise. Our surprising result implies that the average observed treatment effects would underestimate the true effects of each treatment. We quantitatively analyze the magnitude and behavior of this negative bias in a variety of settings. We also propose a novel debiasing algorithm based on selective inference techniques. In experiments, our method can effectively reduce bias and estimation error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2021

Point estimation for adaptive trial designs

Recent FDA guidance on adaptive clinical trial designs defines bias as "...
research
06/03/2022

Monkeypox Image Data collection

This paper explains the initial Monkeypox Open image data collection pro...
research
02/27/2018

Friction Variability in Auto-collected Dataset of Planar Pushing Experiments and Anisotropic Friction

Friction plays a key role in manipulating objects. Most of what we do wi...
research
12/18/2017

Accurate Inference for Adaptive Linear Models

Estimators computed from adaptively collected data do not behave like th...
research
05/05/2021

Policy Learning with Adaptively Collected Data

Learning optimal policies from historical data enables the gains from pe...
research
08/31/2023

The Smart Data Extractor, a Clinician Friendly Solution to Accelerate and Improve the Data Collection During Clinical Trials

In medical research, the traditional way to collect data, i.e. browsing ...
research
08/09/2021

The Weighted Average Illusion: Biases in Perceived Mean Position in Scatterplots

Scatterplots can encode a third dimension by using additional channels l...

Please sign up or login with your details

Forgot password? Click here to reset