DeepAI AI Chat
Log In Sign Up

Offline Policy Evaluation with Out-of-Sample Guarantees

by   Sofia Ek, et al.
Uppsala universitet

We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss or disutility (or negative reward) and the problem is to draw valid inferences about the out-of-sample loss of the specified policy when the past data is observed under a, possibly unknown, policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees that evaluate the entire loss distribution. Importantly, the method takes into account model misspecifications of the past policy – including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under an explicitly specified range of credible model assumptions.


page 1

page 2

page 3

page 4


Learning Robust Decision Policies from Observational Data

We address the problem of learning a decision policy from observational ...

Balanced Policy Evaluation and Learning

We present a new approach to the problems of evaluating and learning per...

Efficient Policy Learning from Surrogate-Loss Classification Reductions

Recent work on policy learning from observational data has highlighted t...

Inference under constrained distribution shifts

Large-scale administrative or observational datasets are increasingly us...

Federated Offline Policy Learning with Heterogeneous Observational Data

We consider the problem of learning personalized decision policies on ob...

Off-policy evaluation for MDPs with unknown structure

Off-policy learning in dynamic decision problems is essential for provid...

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Learning a policy using only observational data is challenging because t...