General-purpose validation and model selection when estimating individual treatment effects
Practitioners in medicine, business, political science, and other fields are increasingly aware that decisions should be personalized to each patient, customer, or voter. A given treatment (e.g. a drug or advertisement) should be administered only to those who will respond most positively, and certainly not to those who will be harmed by it. Individual-level treatment effects (ITEs) can be estimated with tools adapted from machine learning, but different models can yield contradictory estimates. Unlike risk prediction models, however, treatment effect models cannot be easily evaluated against each other using a held-out test set because the true treatment effect itself is never directly observed. Besides outcome prediction accuracy, several approaches that use held-out data to evaluate treatment effects models have been proposed, but they are largely unknown or cloistered within disciplines. We present a review of these approaches and demonstrate theoretical relationships among them. We demonstrate their behavior using simulations of both randomized and observational data. Based on our empirical and theoretical results, we advocate for the standardized use of estimated decision value for individual treatment effect model selection and validation.
READ FULL TEXT