Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

03/02/2023
by   Shuai Xiao, et al.
0

Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

We address a practical problem ubiquitous in modern industry, in which a...
research
02/23/2018

Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising

Real-time bidding (RTB) is almost the most important mechanism in online...
research
10/21/2021

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

We consider a discounted cost constrained Markov decision process (CMDP)...
research
05/26/2021

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

We propose a successive convex approximation based off-policy optimizati...
research
01/10/2023

Sequential Fair Resource Allocation under a Markov Decision Process Framework

We study the sequential decision-making problem of allocating a limited ...
research
05/18/2021

Markdowns in E-Commerce Fresh Retail: A Counterfactual Prediction and Multi-Period Optimization Approach

In this paper, by leveraging abundant observational transaction data, we...
research
02/04/2019

A Unified Framework for Marketing Budget Allocation

While marketing budget allocation has been studied for decades in tradit...

Please sign up or login with your details

Forgot password? Click here to reset