Policy Evaluation with Latent Confounders via Optimal Balance

08/06/2019
by   Andrew Bennett, et al.
0

Evaluating novel contextual bandit policies using logged data is crucial in applications where exploration is costly, such as medicine. But it usually relies on the assumption of no unobserved confounders, which is bound to fail in practice. We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model. We show that unlike the unconfounded case no single set of weights can give unbiased evaluation for all outcome models, yet we propose a new algorithm that can still provably guarantee consistency by instead minimizing an adversarial balance objective. We further develop tractable algorithms for optimizing this objective and demonstrate empirically the power of our method when confounders are latent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2017

Balanced Policy Evaluation and Learning

We present a new approach to the problems of evaluating and learning per...
research
01/03/2021

Binary Outcome Copula Regression Model with Sampling Gradient Fitting

Use copula to model dependency of variable extends multivariate gaussian...
research
06/18/2020

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

We consider off-policy evaluation in the contextual bandit setting for t...
research
11/21/2018

Estimation of Individual Treatment Effect in Latent Confounder Models via Adversarial Learning

Estimating the individual treatment effect (ITE) from observational data...
research
10/29/2021

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

Evaluating the performance of an ongoing policy plays a vital role in ma...
research
07/27/2020

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Off-policy evaluation (OPE) in reinforcement learning is an important pr...
research
05/25/2023

Exponential Smoothing for Off-Policy Learning

Off-policy learning (OPL) aims at finding improved policies from logged ...

Please sign up or login with your details

Forgot password? Click here to reset