Policy Learning with Adaptively Collected Data

05/05/2021
by   Ruohan Zhan, et al.
21

Learning optimal policies from historical data enables the gains from personalization to be realized in a wide variety of applications. The growing policy learning literature focuses on a setting where the treatment assignment policy does not adapt to the data. However, adaptive data collection is becoming more common in practice, from two primary sources: 1) data collected from adaptive experiments that are designed to improve inferential efficiency; 2) data collected from production systems that are adaptively evolving an operational policy to improve performance over time (e.g. contextual bandits). In this paper, we aim to address the challenge of learning the optimal policy with adaptively collected data and provide one of the first theoretical inquiries into this problem. We propose an algorithm based on generalized augmented inverse propensity weighted estimators and establish its finite-sample regret bound. We complement this regret upper bound with a lower bound that characterizes the fundamental difficulty of policy learning with adaptive data. Finally, we demonstrate our algorithm's effectiveness using both synthetic data and public benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2021

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

It has become increasingly common for data to be collected adaptively, f...
research
11/22/2022

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

We design and implement an adaptive experiment (a “contextual bandit”) t...
research
06/15/2020

Piecewise-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policie...
research
12/19/2022

Policy learning "without” overlap: Pessimism and generalized empirical Bernstein's inequality

This paper studies offline policy learning, which aims at utilizing obse...
research
07/03/2023

Adaptive Principal Component Regression with Applications to Panel Data

Principal component regression (PCR) is a popular technique for fixed-de...
research
08/07/2017

Why adaptively collected data have negative bias and how to correct for it

From scientific experiments to online A/B testing, the previously observ...
research
02/17/2023

Post-Episodic Reinforcement Learning Inference

We consider estimation and inference with data collected from episodic r...

Please sign up or login with your details

Forgot password? Click here to reset