Off-policy evaluation beyond overlap: partial identification through smoothness

05/19/2023
by   Samir Khan, et al.
0

Off-policy evaluation (OPE) is the problem of estimating the value of a target policy using historical data collected under a different logging policy. OPE methods typically assume overlap between the target and logging policy, enabling solutions based on importance weighting and/or imputation. In this work, we approach OPE without assuming either overlap or a well-specified model by considering a strategy based on partial identification under non-parametric assumptions on the conditional mean function, focusing especially on Lipschitz smoothness. Under such smoothness assumptions, we formulate a pair of linear programs whose optimal values upper and lower bound the contributions of the no-overlap region to the off-policy value. We show that these linear programs have a concise closed form solution that can be computed efficiently and that their solutions converge, under the Lipschitz assumption, to the sharp partial identification bounds on the off-policy value. Furthermore, we show that the rate of convergence is minimax optimal, up to log factors. We deploy our methods on two semi-synthetic examples, and obtain informative and valid bounds that are tighter than those possible without smoothness assumptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2021

Off-Policy Evaluation in Partially Observed Markov Decision Processes

We consider off-policy evaluation of dynamic treatment rules under the a...
research
04/18/2021

Off-Policy Risk Assessment in Contextual Bandits

To evaluate prospective contextual bandit policies when experimentation ...
research
09/12/2023

Sensitivity Analysis for Linear Estimands

We propose a novel sensitivity analysis framework for linear estimands w...
research
12/19/2022

Policy learning "without” overlap: Pessimism and generalized empirical Bernstein's inequality

This paper studies offline policy learning, which aims at utilizing obse...
research
06/18/2020

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

We consider off-policy evaluation in the contextual bandit setting for t...
research
10/21/2019

Bounds in continuous instrumental variable models

Partial identification approaches have seen a sharp increase in interest...
research
01/19/2021

Can smooth graphons in several dimensions be represented by smooth graphons on [0,1]?

A graphon that is defined on [0,1]^d and is Hölder(α) continuous for som...

Please sign up or login with your details

Forgot password? Click here to reset