Towards Robust Off-Policy Evaluation via Human Inputs

09/18/2022
by   Harvineet Singh, et al.
0

Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches consider robustness against a large class of shifts that can arbitrarily change any observable property of the environment. This often results in highly pessimistic estimates of the utilities, thereby invalidating policies that might have been useful in deployment. In this work, we address the aforementioned problem by investigating how domain knowledge can help provide more realistic estimates of the utilities of policies. We leverage human inputs on which aspects of the environments may plausibly change, and adapt the OPE methods to only consider shifts on these aspects. Specifically, we propose a novel framework, Robust OPE (ROPE), which considers shifts on a subset of covariates in the data based on user inputs, and estimates worst-case utility under these shifts. We then develop computationally efficient algorithms for OPE that are robust to the aforementioned shifts for contextual bandits and Markov decision processes. We also theoretically analyze the sample complexity of these algorithms. Extensive experimentation with synthetic and real world datasets from the healthcare domain demonstrates that our approach not only captures realistic dataset shifts accurately, but also results in less pessimistic policy evaluations.

READ FULL TEXT

page 3

page 7

research
03/29/2021

Learning Under Adversarial and Interventional Shifts

Machine learning models are often trained on data from one distribution ...
research
02/26/2021

Towards Robust and Reliable Algorithmic Recourse

As predictive models are increasingly being deployed in high-stakes deci...
research
09/21/2021

Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles

Local decision rules are commonly understood to be more explainable, due...
research
10/19/2022

"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts

Performance of machine learning models may differ between training and d...
research
06/01/2021

Invariant Policy Learning: A Causal Perspective

In the past decade, contextual bandit and reinforcement learning algorit...
research
10/28/2020

Evaluating Model Robustness to Dataset Shift

As the use of machine learning in safety-critical domains becomes widesp...
research
07/13/2021

Distributionally Robust Policy Learning via Adversarial Environment Generation

Our goal is to train control policies that generalize well to unseen env...

Please sign up or login with your details

Forgot password? Click here to reset