Learning Personalized Policy with Strategic Agents

There is increasing interest in allocating treatments based on observed individual data: examples include heterogeneous pricing, individualized credit offers, and targeted social programs. Personalized policy introduces incentives for individuals to modify their behavior to obtain a better treatment. We show standard risk minimization-based estimators are sub-optimal when observed covariates are endogenous to the treatment allocation rule. We propose a dynamic experiment that converges to the optimal treatment allocation function without parametric assumptions on individual strategic behavior, and prove that it has regret that decays at a linear rate. We validate the method in simulations and in a small MTurk experiment.



