Improving Consequential Decision Making under Imperfect Predictions
Consequential decisions are increasingly informed by sophisticated data-driven predictive models. For accurate predictive models, deterministic threshold rules have been shown to be optimal in terms of utility, even under a variety of fairness constraints. However, consistently learning accurate models requires access to ground truth data. Unfortunately, in practice, some data can only be observed if a certain decision was taken. Thus, collected data always depends on potentially imperfect historical decision policies. As a result, learned deterministic threshold rules are often suboptimal. We address the above question from the perspective of sequential policy learning. We first show that, if decisions are taken by a faulty deterministic policy, the observed outcomes under this policy are insufficient to improve it. We then describe how this undesirable behavior can be avoided using stochastic policies. Finally, we introduce a practical gradient-based algorithm to learn stochastic policies that effectively leverage the outcomes of decisions to improve over time. Experiments on both synthetic and real-world data illustrate our theoretical results and show the efficacy of our proposed algorithm.
READ FULL TEXT