Cost-Effective Incentive Allocation via Structured Counterfactual Inference

02/07/2019
by   Romain Lopez, et al.
8

We address a practical problem ubiquitous in modern industry, in which a mediator tries to learn a policy for allocating strategic financial incentives for customers in a marketing campaign and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we rely on a specific assumption for the reward structure and we incorporate budget constraints. We develop a new two-step method for solving this constrained counterfactual policy optimization problem. First, we cast the reward estimation problem as a domain adaptation problem with supplementary structure. Subsequently, the estimators are used for optimizing the policy with constraints. We establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2018

Efficient Counterfactual Learning from Bandit Feedback

What is the most statistically efficient way to do off-policy evaluation...
research
03/02/2023

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Sequential incentive marketing is an important approach for online busin...
research
05/14/2019

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models

We introduce an off-policy evaluation procedure for highlighting episode...
research
02/09/2015

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch lea...
research
10/19/2022

Anytime-valid off-policy inference for contextual bandits

Contextual bandit algorithms are ubiquitous tools for active sequential ...
research
03/03/2023

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Linear temporal logic (LTL) offers a simplified way of specifying tasks ...
research
10/09/2019

Robust Dynamic Assortment Optimization in the Presence of Outlier Customers

We consider the dynamic assortment optimization problem under the multin...

Please sign up or login with your details

Forgot password? Click here to reset