Constructing Effective Personalized Policies Using Counterfactual Inference from Biased Data Sets with Many Features

12/23/2016
by   Onur Atan, et al.
0

This paper proposes a novel approach for constructing effective personalized policies when the observed data lacks counter-factual information, is biased and possesses many features. The approach is applicable in a wide variety of settings from healthcare to advertising to education to finance. These settings have in common that the decision maker can observe, for each previous instance, an array of features of the instance, the action taken in that instance, and the reward realized -- but not the rewards of actions that were not taken: the counterfactual information. Learning in such settings is made even more difficult because the observed data is typically biased by the existing policy (that generated the data) and because the array of features that might affect the reward in a particular instance -- and hence should be taken into account in deciding on an action in each particular instance -- is often vast. The approach presented here estimates propensity scores for the observed data, infers counterfactuals, identifies a (relatively small) number of features that are (most) relevant for each possible action and instance, and prescribes a policy to be followed. Comparison of the proposed algorithm against the state-of-art algorithm on actual datasets demonstrates that the proposed algorithm achieves a significant improvement in performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2011

Doubly Robust Policy Evaluation and Learning

We study decision making in environments where the reward is only partia...
research
03/10/2015

Doubly Robust Policy Evaluation and Optimization

We study sequential decision making in environments where rewards are on...
research
09/15/2022

Semi-Counterfactual Risk Minimization Via Neural Networks

Counterfactual risk minimization is a framework for offline policy optim...
research
10/04/2022

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Inverse reinforcement learning (IRL) aims to recover the reward function...
research
05/04/2023

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

We investigate an infinite-horizon average reward Markov Decision Proces...
research
03/15/2018

Temporal Human Action Segmentation via Dynamic Clustering

We present an effective dynamic clustering algorithm for the task of tem...
research
09/08/2021

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

In high-stakes applications of data-driven decision making like healthca...

Please sign up or login with your details

Forgot password? Click here to reset