-
Balanced Policy Evaluation and Learning
We present a new approach to the problems of evaluating and learning per...
read it
-
Binary Outcome Copula Regression Model with Sampling Gradient Fitting
Use copula to model dependency of variable extends multivariate gaussian...
read it
-
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting
We consider off-policy evaluation in the contextual bandit setting for t...
read it
-
Estimation of Individual Treatment Effect in Latent Confounder Models via Adversarial Learning
Estimating the individual treatment effect (ITE) from observational data...
read it
-
Balanced Off-Policy Evaluation General Action Spaces
In many practical applications of contextual bandits, online learning is...
read it
-
Off-policy Confidence Sequences
We develop confidence bounds that hold uniformly over time for off-polic...
read it
-
Representation Balancing MDPs for Off-Policy Policy Evaluation
We study the problem of off-policy policy evaluation (OPPE) in RL. In co...
read it
Policy Evaluation with Latent Confounders via Optimal Balance
Evaluating novel contextual bandit policies using logged data is crucial in applications where exploration is costly, such as medicine. But it usually relies on the assumption of no unobserved confounders, which is bound to fail in practice. We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model. We show that unlike the unconfounded case no single set of weights can give unbiased evaluation for all outcome models, yet we propose a new algorithm that can still provably guarantee consistency by instead minimizing an adversarial balance objective. We further develop tractable algorithms for optimizing this objective and demonstrate empirically the power of our method when confounders are latent.
READ FULL TEXT
Comments
There are no comments yet.