Markovian Interference in Experiments

06/06/2022
by   Vivek F. Farias, et al.
0

We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited inventory). Despite outsize practical importance, the best estimators for this `Markovian' interference problem are largely heuristic in nature, and their bias is not well understood. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general have exponentially smaller variance than off-policy evaluation. At the same time, its bias is second order in the impact of the intervention. This yields a striking bias-variance tradeoff so that the DQ estimator effectively dominates state-of-the-art alternatives. From a theoretical perspective, we introduce three separate novel techniques that are of independent interest in the theory of Reinforcement Learning (RL). Our empirical evaluation includes a set of experiments on a city-scale ride-hailing simulator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2023

Correcting for Interference in Experiments: A Case Study at Douyin

Interference is a ubiquitous problem in experiments conducted on two-sid...
research
11/11/2015

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement lea...
research
08/07/2023

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

We study Off-Policy Evaluation (OPE) in contextual bandit settings with ...
research
05/16/2016

Off-policy evaluation for slate recommendation

This paper studies the evaluation of policies that recommend an ordered ...
research
02/23/2023

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

In this work, we consider the off-policy policy evaluation problem for c...
research
05/14/2023

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

We study off-policy evaluation (OPE) of contextual bandit policies for l...
research
02/20/2021

An unbiased ray-marching transmittance estimator

We present an in-depth analysis of the sources of variance in state-of-t...

Please sign up or login with your details

Forgot password? Click here to reset