A Policy Gradient Method with Variance Reduction for Uplift Modeling

11/26/2018
by   Chenchen Li, et al.
0

Uplift modeling aims to directly model the incremental impact of a treatment on an individual response. It has been widely and successfully used in healthcare analytics and business operations, where one tries to measure the net effect of a new medicine on patients or to understand the impact of a marketing campaign on company revenue. In this work, we address the problem from a new angle and reformulate it as a Markov Decision Process (MDP). This new formulation allows us to handle the lack of explicit labels, to deal with any number of actions (in comparison to the normal two action uplift modeling), and to apply it to applications with responses of general types, which is a challenging task for previous methods. Furthermore, we also design an unbiased metric for more accurate offline evaluation of uplift effects, set up a better reward function for the policy gradient method to solve the problem and adopt some action-based baselines to reduce variance. We conducted extensive experiments on both a synthetic dataset and real-world scenarios, and showed that our method can achieve significant improvement over previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2023

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

In this paper, we consider an infinite horizon average reward Markov Dec...
research
03/20/2018

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Policy gradient methods have enjoyed great success in deep reinforcement...
research
07/10/2018

Generalized deterministic policy gradient algorithms

We study a setting of reinforcement learning, where the state transition...
research
03/14/2018

Uplift Modeling from Separate Labels

Uplift modeling is aimed at estimating the incremental impact of an acti...
research
07/11/2021

Coordinate-wise Control Variates for Deep Policy Gradients

The control variates (CV) method is widely used in policy gradient estim...
research
02/13/2018

Rebalancing Dockless Bike Sharing Systems

Bike sharing provides an environment-friendly way for traveling and is b...
research
11/30/2017

Budget-Aware Activity Detection with A Recurrent Policy Network

In this paper, we address the challenging problem of effi- cient tempora...

Please sign up or login with your details

Forgot password? Click here to reset