Personalized Policy Learning using Longitudinal Mobile Health Data

01/09/2020 ∙ by Xinyu Hu, et al. ∙ 0

We address the personalized policy learning problem using longitudinal mobile health application usage data. Personalized policy represents a paradigm shift from developing a single policy that may prescribe personalized decisions by tailoring. Specifically, we aim to develop the best policy, one per user, based on estimating random effects under generalized linear mixed model. With many random effects, we consider new estimation method and penalized objective to circumvent high-dimension integrals for marginal likelihood approximation. We establish consistency and optimality of our method with endogenous app usage. We apply our method to develop personalized push ("prompt") schedules in 294 app users, with a goal to maximize the prompt response rate given past app usage and other contextual factors. We found the best push schedule given the same covariates varied among the users, thus calling for personalized policies. Using the estimated personalized policies would have achieved a mean prompt response rate of 23 improvement on the observed rate (11 user engagement at 3 months after download. The proposed method compares favorably to existing estimation methods including using the R function "glmer" in a simulation study.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mobile technologies such as smartphones and wearables enable continuous monitoring of exposure to environmental stressors and ecological assessment of health-relevant data over an extended period of time, thereby facilitating the delivery of tailored intervention in an adaptive manner (riley2011health). Examples abound. heron2010ecological review the use of tailored interventions based on momentary assessments to support management of a variety of health behaviors and symptoms such as smoking, diabetes, and weight loss. depp2010mobile study the efficacy of personalized pushed engagement based on real-time data in mental illness patients. mohr2013continuous envision a continuous evaluation system of health apps based on evidence generated by routinely collected data. To illustrate, we consider a suite of smartphone apps (called IntelliCare) that serves users with anxiety or depression using different psychological treatment strategies including cognitive behavioral therapy, positive psychology, and physical activity-based interventions (mohr2017intellicare). The suite consists of a Hub app that helps users navigate apps within the IntelliCare ecosystem and coordinate their experience, with a specific function to provide links and recommendations for other IntelliCare apps so as to maximize user engagement based on a user’s app usage history (cheung2018). In this article, we are motivated by a sub-study of the IntelliCare suite, in which the Hub app would send pushed notifications to prompt a user to complete a short four-item patient health questionnaire repeatedly on 7-day intervals at a random time during the day. While the purpose of the prompts is to remind user to assess their depression and anxiety symptoms, the response rate was expected to be modest and declining quickly over time based user engagement reported in the literature (cgf2009; helander2014). Since time of day is a known factor of mobile application usage (angrybird), the objective of this study is to learn the best time period to push the prompt (policy) that maximizes response given other contextual factors a user experiences as well as the user’s past engagement. In addition, since there is often unobserved between-user heterogeneity due to a user’s own circumstances that is difficult to capture or measure (heter2017), the eventual goal is to develop policies, one for each user, that can provide personalized feedback through their interaction with the IntelliCare apps.

Numerous policy learning methods that support decision making using medical data and mobile health data have been proposed. For example, there is a large statistical literature on reinforcement learning algorithms that estimate optimal policies under a nomothetic model

(murphy2003optimal; qian2011performance; zhang2012robust; laber2014iq; song2014penalized; zhao2012estimating; zhao2015new; ertefaie2018constructing; luckett2019estimating). A nomothetic approach assumes that a population model captures all between-subject heterogeneity and facilitates estimation by pooling data across participants. While this approach may address user heterogeneity and allow for the estimation of personalized policies by incorporating appropriate interactions with the actions, it often requires the untestable assumption of no unobserved confounders. Alternatively, an ideographic approach achieves personalization using an “N-of-1” approach whereby a person’s own decision model is estimated using the person’s own data only (nof1-1; duan2014; lei2014actor). Although this approach in principle allows for insights about individuals without assumptions about any reference population, its practicality relies on how long a user can be followed. In general, the efficiency of this approach may suffer, especially in situations where an action exhibits similar effects on all individuals.

In this article, we consider estimating personalized policies under the generalized linear mixed model (GLMM) framework with the outcome at each time point as the dependent variable and time-varying covariates, action and their interactions as the predictors. For instance, in the IntelliCare “Prompt” sub-study, the outcome of interest is a binary response and the action is the time period during a day when a prompt is pushed. The estimated policy aims to recommend an action that maximizes the predicted outcome based on the contextual factors experienced by a user and the user’s past engagement. In addition to tailoring, each user will have a personalized policy through the estimation of the random effects, which capture individual departure from the population model due to unobserved heterogeneity.

While GLMM is one of the most popular methods to handle longitudinal outcome data, GLMM-based estimation methods are largely designed for settings where the covariates are exogenous with respect to the outcome process. When the time-varying covariates are allowed to be endogenous, that is, letting them depend on the outcome process, previous treatment assignments, and possibly random effect parameters, estimation of the GLMM fixed effect coefficients—based on likelihood or generalized estimating equations—may lead to bias, because it no longer corresponds to the conditional interpretation of the parameters see sullivan1994cautionary and diggle2002analysis for example. In the case of linear mixed models, when the conditional interpretation of fixed effects is consistent with the scientific interest in predicting person-specific effects, qian2019linear show that standard software can be used to obtain a valid estimate of the fixed effects if the time-varying covariates are independent of the random effects parameters conditional on past history. In this article, we examine the conditions under which the proposed estimation method work in the presence of endogeneity in GLMM. Furthermore, as it will be shown in Section 3, our method does not require a full conditional distribution of outcome or random effects to be correctly specified, but relies on a much weaker assumption that the conditional mean outcome model is correctly specified.

We note some previous work on estimating personalized treatment using GLMM. For example, cho2017personalize

use GLMM to predict individual outcome under each treatment arm with a random slope on the treatment indicator, and build a random forest model to predict random slope using patients baseline covariates. Personalized treatment can then be implemented by selecting the treatment with the maximal estimated random effects. However, little if any of the previous work includes random effects for treatment-by-covariate interactions in the model, thus having no provision for tailoring. Allowing for random effects for treatment-by-covariate interactions presents a key computational challenge, as most methods rely on approximating of the marginal likelihood of the outcomes by integrating out the random effects. When there are moderate or large number of random effects terms, standard GLMM software fail to produce accurate approximation of the integrals. To address the computational challenge, we propose a novel algorithm that estimates the fixed effects and random effects jointly with a ridge-type penalty on the latter. In addition, to avoid overfitting individual deviations from the population mean, we propose to apply a group lasso penalty on the random effects

(yuan2006model). This penalized approach is critical in circumventing the large number of random effects for treatment-by-covariate interactions.

This article is organized as follows. In Section 2, we set up the formulation of the personalized policy learning problem, and present new policy estimation methods. We then study the theoretical properties of the proposed method in Section 3, and compare it with some existing approaches in Section 4. We will revisit the IntelliCare Prompt study in Section 5 and apply the proposed method to develop personalized policies in the study. We end this article with some concluding remarks in Section 6. Details of computational algorithms, technical derivations, and proofs are provided as separate Supplementary Materials.

2 Personalized Policy Learning

2.1 Notations and Problem Formulation

Suppose mobile application user is tracked longitudinally over time points. At time , an action taking values in a pre-specified finite discrete action space

is randomized to the user, with a vector of covariates

observed prior to the action. Let denote the outcome of interest observed after each action, with the convention that large values of are good. We note that the covariates may include endogenous variables that depend on previous outcomes and actions, as well as other exogenous and contextual factors. In summary, the trajectory of each user is denoted by the triplets . We further denote the entire history up to by and .

Our objective is to estimate for a given user a personalized policy , which when implemented will result in the maximal conditional expected outcome, , where the expectation is taken with respect to the conditional distribution of given the history and action is consistent with . We further make the commonly used assumption that the conditional distribution of given is Markovian, so that . We note that can include lagged variables at previous time points (e.g., ). Further let so as to make explicit the conditional expectation is user-specific. Then . Once is estimated by (say), the estimated policy will be used to guide decision making for the user in the future time points. While this formulation of the problem assumes a stationary policy in that the function is time-invariant, the policy decisions can be time-dependent by including time in the covariate state . In our application, this assumption is aligned with the fact that mobile application usage is habitual given other contextual factors.

We facilitate the learning problem under the GLMM framework, and postulate


for and , where is a known strictly monotone increasing link function. For example, the canonical forms of

are respectively the identity function for continuous outcome, logit for binary outcome, and logarithmic for counts. Here

is a -dimensional vector of unknown parameters, and is a pre-specified vector function of so that is the fixed effects component; for example, . The random effects are denoted by , an () matrix with the -th row, , denoting the random effects parameters for the -th user, and is a sub-vector of chosen so that models subject-specific deviations from the mean model. Under model (1) and a monotone increasing , the optimal policy can be expressed as


Note that play dual roles in our proposed method. On one hand, it defines the individual deviation from the mean model of the -th user, and can be viewed as a fixed parameter to be estimated and to be acted upon. This role operationalizes the personalized policy decisions (2). On the other hand, can be viewed as a random sample of the population. This viewpoint motivates some degree of “smoothness” in the estimation of ’s, which is described next.

2.2 Policy Estimation

Let denote the working conditional log-likelihood of under a fully specified GLMM with the systematic component (1). For example, with a continuous , we may set to be the Gaussian log-likelihood with mean

, variance

, and an identity link. When is binary, we may choose

to be the Bernoulli log-likelihood with probability

and an logit link. However, the theoretical results described in Section 3 will hold for any choice of that satisfies

and (3)

where is the -th row of , is a nuisance parameter in the working log-likelihood, and and denote the partial derivatives of with respect to and , respectively. It is easy to verify that the Gaussian and the Bernoulli log-likelihoods satisfy (3); and since they are often the practical choices for continuous and binary outcomes, they may be used as pseudo-log-likelihood in many applications. Correspondingly, we define the penalized pseudo-log-likelihood


where is a symmetric positive semi-definite matrix, is the Moore-Penrose generalized inverse of , and is a tuning parameter.

We propose to estimate and by maximizing (4). The maximum penalized-pseudo-likelihood estimator is denoted by


and the corresponding personalized policy for user is estimated by

analogously to in (2).

The second term on the right hand side of (4) puts a ridge-type penalty to shrink and stabilize the estimation of the random effects . Under the viewpoint that is a random sample of a population, it is natural to choose to reflect the variance-covariance matrix of , although it is not required for the asymptotic properties to hold (see Section 3). The third term in (4) is the group lasso penalty, where each group contains the random effects parameter of the -th term in for all users. Under a similar viewpoint, it is intuitive to set the group-specific weight to be inverse proportional to the variance of .

In practice, we propose to update , and ’s iteratively, in conjunction with the trust region newton (TRON) algorithm in the estimation of and . Briefly, the TRON algorithm combines the trust region method (steihaug1983conjugate) and the truncated newton method (nash2000survey) to solve an unconstrained convex optimization problem. At each iteration, TRON defines a trust region and approximates the objective function using a quadratic model within the region. If a pre-specified change of the objective function is achieved in the current iteration, the updated direction is accepted and the region is expanded; the region will be shrunk otherwise. The approximation sub-problem is solved via the conjugate gradient method. Since TRON solves the inverse of a potentially large Hessian matrix by iteratively updating the parameters, convergence can be achieved quickly with a large and dense Hessian. Overall, the computational cost per iteration is of the order of the number of nonzero elements in the design matrix. In addition, we propose to choose the tuning parameter for the group lasso penalty using an AIC-type criterion. The details are given in Section S1 of the Supplementary Material.

3 Theoretical Remarks

In this section, we study the asymptotic behavior of and , and conditional and marginal performance of estimated policies ’s under the following assumptions. All proofs are given in Section S2 of the Supplementary Material.

  1. There exists a positive constant , such that the treatment randomization probability for all possible values of at any time point .

  2. The random vectors and and outcome are square integrable under the data generative distribution for and .

  3. The latent random effects , are independent and identically distributed with mean and finite variance .

  4. There exists such that (1) holds, and is -almost surely an interior point of a compact set .

  5. The pseudo-log-likelihood is concave in , satisfies condition (3), and its expected second order derivative is continuous in .

  6. Denote . We need the following regularity conditions:

    1. As , satisfies
      -almost surely, where denotes the Frobenius norm;

      is positive definite with all eigenvalues greater than


    2. .

    3. For , as , satisfies
      -almost surely;
      and is positive definite with all eigenvalues greater than .

  7. The weights satisfy , where is the -th diagonal element of , the variance-covariance matrix of .

  8. The tuning parameter satisfies .

Theorem 1

Suppose Assumptions a-h hold. As , satisfies and , .

Remarks. Condition f is similar to the regularity conditions required in maximum likelihood estimation. In particular, when the covariates ’s are exogenous, it is easy to verify that f holds under the regularity conditions used in GLMM. Interestingly, Condition f will hold under many situations when ’s are endogenous; and importantly, these situations can be verified. For illustration purposes, we verify this condition in the Appendix in two quite common scenarios: (i) when is binary and the distribution of directly depends on the latent random effects ; (ii) when

follows Gaussian distribution and


Theorem 1 characterizes the asymptotic behavior of every under the condition that . This condition, however, can be relaxed if we are only interested in the asymptotic behavior of on average. Specifically, we only require that the proportion of ’s that do not go to infinity goes to zero. Without loss of generality, suppose . Let be the index so that is bounded, and .

Corollary 1

Suppose f(iii) holds for , and the remaining assumptions in a-h continue to hold. As , suppose . Then, and

Next, we present the properties of the estimated personalized policies . Specifically we consider both the conditional expected outcome under at each time point given , and the marginal expected outcome assuming is used to make decision for user from the beginning to time point . The results are stated in the theorem below.

Theorem 2

Assume all conditions in Corollary 1 hold. Suppose the inverse link function of the corresponding exponential family distribution, , is Hölder continuous. That is, for any in the domain, , where is a positive constant and . Then for any , as ,


In addition, assume


where is the potential outcome of that would have been observed were used to make decision up to time point . Then we have,



  1. The personalized policy is optimal in the conditional sense, in that it yields the maximal expected outcome if treatment assignment is consistent with given . As such, Equation (6) describes the conditional optimality of estimated policies ’s given the current information. We note that may not necessarily be optimal in a marginal sense after integrating out , because the distribution of depends on previous treatment assignment. Therefore, Equation (8) in the above theorem implies consistency rather than optimality.

  2. Condition (7) implies that the optimal decision at time is unique almost surely, given that were used to make decision at previous time points. This assumption is not needed to show consistency when .

4 Simulation study

4.1 Setup

In this section, we examine the estimation properties of the maximum penalized-pseudo-likelihood estimator in (5) and the performance of the personalized policy using simulation.

In a simulated trial, each user would be followed for time points for training purposes, with 10 additional subsequent testing time points. At time point , user would receive one of three possible actions with equal probability, that is, the actions were generated randomly with probabilities ; the actions

’s were then coded using two dummy variables and were centered. The covariate process

included a binary endogenous variable , which would depend on the previous outcome , the previous action and the random effects . Specifically, we set , and

for , where expit is the expit function, , and is the -th component of for . We considered both binary and continuous outcomes. The conditional mean of the outcome was defined according to (1) where and

denotes the Kronecker product, with logit and identity links respectively for the binary and continuous outcomes. The continuous outcomes were generated with an independent Gaussian noise with standard deviation 1.5. The true fixed effects were specified by

We considered two scenarios for the random effects , which were generated from mean zero Gaussian: we set variance-covariance matrix to be to represent a scenario with non-sparse random effects, and with sparse random random effects. We generated 200 simulated trials, each having users. Once the random effects were sampled, they were treated as fixed parameters in the 50 users.

The estimation properties of the policy parameters based on the training data were evaluated using mean squared error (MSE), defined as , where is the sub-vector of involved in policy (i.e. coefficients of and ). The quality of decisions at the testing time points by the estimated policies was evaluated in terms of the expected conditional outcome under at each testing time point :

. To facilitate comparison across scenarios, we standardized the expected outcome against the optimal policy and the worst policy and obtained the value ratio (VR) for the estimated policy :

4.2 Comparison Methods

In the simulation, we considered some existing methods as alternatives to the proposed personalized policy learning method, which shall be denoted as PPL in the followings.

Under the GLMM framework, instead of using the proposed algorithm described in Section 2.2, we used the “glmer” function in the lme4 package in R (bates2014fitting). This method shall be denoted as glmer. The function “glmer” would involve approximating the marginal likelihood by integrating over the random effects. This could be problematic in situations with a large number of random effects (thus having a high-dimensional integrals) and endogenous covariates.

In addition, we considered the regularized penalized quasi-likelihood (rPQL) approach developed by hui2017joint for exogenous covariates as yet another alternative to estimating under the GLMM framework. While rPQL also imposed a group lasso penalty, our proposed algorithm took a different computational approach: First, we adopted the novel trust region method to solve the optimization problem; second, we updated the weights ’s iteratively whereas rPQL would keep the weights at their initial values throughout the computation.

While the methods above would prescribe personalized policies, we also considered using generalized estimating equations (GEE) to estimate a population-level effect, and developed a non-personalized policy by choosing actions maximizing the estimated population mean. We used an independence working correlation structure, so as to avoid bias under linear models with endogenous variables; see boruvka2018assessing.

Finally, we examined the performance of an “N-of-1” approach whereby each user’s personalized policy was estimated by fitting a generalized linear model to the user’s own data only. That is, there was no borrowing information from across users in this method with multiple generalized linear model (MGLM). We anticipated that MGLM would have difficulties when was small, especially with Bernoulli outcomes.

4.3 Simulation Results

Table 1 compares the MSE of the policy parameters in the simulation scenario with non-sparse random effects. Overall, the proposed PPL has the smallest MSE when . Its superior performance to the other two GLMM-based methods (glmer and rPQL) indicates the computational advantages of using the trust region algorithm with iterated weights. These three methods, as expected, improve with large , that is, having more data points.

The “N-of-1” MGLM performs poorly with binary outcome and when with continuous outcome. Even with a moderate-to-large , the method remains inferior to the other methods. This signifies the importance of borrowing information from across users, even though our goal is to produce different policies for different users.

Interestingly, GEE has the smallest MSE when and performs relatively well with the larger ’s. While it is somewhat surprising at first glance, we note that by avoiding estimating the random effects ( is estimated with ), GEE will induce the least variability and hence the MSE. It is illuminating that the method’s MSE does not improve as increases, when bias becomes dominating in the bias-variance tradeoff.

Binary Continuous
Method =10 =20 =30 =10 =20 =30
PPL 8.22(3.31) 5.41(0.61) 4.70(0.58) 8.99(3.67) 3.69(0.47) 2.37(0.22)
glmer 43.39(34.16) 9.65(2.72) 6.39(1.17) 13.94(4.79) 4.87(0.65) 3.03(0.38)
GEE 7.87(2.92) 6.08(0.57) 6.17(0.52) 8.38(2.83) 5.94(0.38) 5.74(0.21)
MGLM >1E10 >1E10 >1E10 272.35(93.00) 36.24(35.00) 8.10(4.33)
rPQL 8.71(3.9) 5.87(0.73) 5.23(0.65) 7.73(2.32) 5.31(0.30) 4.44(0.26)
Table 1: Estimation properties under scenario with non-sparse random effects (Average MSE (SD) over simulation trials).

Table 2 compares the methods under the scenario with sparse random effects. The relative performance of the methods is similar to that in Table 1, although the bias induced by GEE becomes more apparent as the variability in the data is smaller in this scenario. In particular, PPL and glmer has substantially smaller MSE in this scenario than when random effects are not as sparse.

Binary Continuous
Method =10 =20 =30 =10 =20 =30
PPL 7.75(3.57) 4.41(0.80) 3.69(0.69) 7.17(3.09) 2.44(0.46) 1.33(0.20)
glmer 44.56(38.85) 8.80(2.50) 5.73(1.50) 11.75(4.08) 3.48(0.64) 1.86(0.33)
GEE 7.24(3.06) 5.04(0.71) 5.11(0.64) 7.15(2.98) 4.85(0.37) 4.67(0.23)
MGLM >1E10 >1E10 >1E10 274.92(105.00) 34.73(31.90) 7.19(1.78)
rPQL 8.04(3.86) 4.89(0.92) 4.22(0.82) 6.31(1.96) 4.42(0.40) 3.08(0.32)
Table 2: Estimation properties under scenario with sparse random effects. (Average MSE (SD) over simulation trials).

To compare the decision quality of the five methods, Figures 1 and 2 plot the simulated mean value ratio at the testing time points following training time points from each user, respectively under non-sparse random effects and sparse random effects.

The proposed PPL has the largest value ratio for each possible state for both binary and continuous outcomes. That GEE producing the smallest MSE when does not translate into good decision quality, as the method has the smallest value ratio uniformly in our simulation, when compared to all other personalized policy methods. This serves as an important illustration how simply considering personalized policy, as opposed to personalized decisions (which GEE also prescribes), could lead to potentially radical gain. It is interesting to note that methods that induce large variability in estimation can be quite competitive; for example, MGLM and glmer for continuous outcome when . It is due to the fact that the decision quality largely relies on correctly estimating the sign of the random effects, not the magnitude. Therefore, one ought to examine both the estimation properties and decision quality in the comparison of methods. Overall, our simulation results indicate the proposed PPL win in these terms. The relative performance of the methods is similar when , and the results are presented in Section S3 of the Supplementary Material.

Figure 1: Value ratio at each testing time point in the simulation with under scenario with non-sparse random effects by different .
Figure 2: Value ratio at each testing time point in the simulation with under scenario with sparse random effects by different .

5 Application

We apply the proposed PPL to estimate the best personalized push schedule in 294 users, who have received at least 20 prompts to complete the patient-health questionnaire since they downloaded the Hub app. Since the prompts were scheduled on 7-day intervals, this would represent a subsample of users with at least 20 weeks of app use. The distribution of the number of prompts in these users is shown in Figure 3. In the data, we tracked the timestamp of when a prompt was sent. For the purpose of this analysis, we grouped the time of prompt into four periods: Night (): from midnight to 6:00am; Morning (): from 6:00am to noon; Afternoon (): from noon to 6:00pm; Evening (): from 6:00pm to midnight. The observed proportions of the four periods were respectively 0.10, 0.23, 0.35, and 0.32. Using as the reference group, we used three dummy variables, centered by the observed proportions, to code the actions , and in model fitting.

Figure 3: The distribution of the number of prompts in 294 IntelliCare users.

The state at each time point consisted of three variables. First, the number of times the Hub was launched (launches) in the week prior to the prompt was recorded. Second, the timestamp indicated whether a prompt was sent on a weekday (weekday). Third, the time point of the prompt was included as a predictor in the covariate process . With a binary response outcome, we estimated under model (1) with a logit link, and using the first 80% of the time points of each user as training data. Since each user had at least 20 prompts, we had in the training data for all 294 users.

Table 3 summarizes the results of the model fit. The positive fixed effects for suggest prompts in the morning, afternoon, and evening tend to induce better response rate than those sent during the night (midnight to 6:00am). The effects associated with these non-night periods are even greater on weekdays, indicated by the positive (fixed) interaction between weekday and these periods. While this result is not surprising, we also note substantial heterogeneity of the period effects and the :period interactions, whose SDs have comparable magnitude to . This supports the needs for personalizing push schedule in our application.

In contrast, for the :period interactions and the :period interactions, the fixed effects () dominate the random effects; heterogeneity of the random effects coefficients are measured by SD

. Based on the fixed effects, the response rate decreases over time, by 0.20 in log-odds over

time points. This is in line with findings in the literature; see helander2014 for example. In addition, every five additional launches of the Hub in the prior week improves the log-odds of response to a night prompt by . Based on the negative coefficients of launches:period interactions, a large number of launches also seems to attenuate or even negate the effects of the time of prompts. This suggests that for active users who engage the Hub often, their response pattern is less sensitive to the time of the prompt.

Variables SD
-1.80 1.31
(per 5 times) 1.52
(per 5 time points) -0.20
Morning () 1.65 1.13
Afternoon () 1.57 0.95
Evening () 1.06 0.78
0.73 0.34
0.16 0.62
0.66 0.52
-2.46 0.39
-1.40 0.21
-1.15 0.43
- 1.25 0.66
-0.96 0.47
-0.93 0.44
Table 3: Model fit using the training data: is the coefficients of the fixed effects, and SD is the standard deviation of the fitted individual random effects coefficients.

The quality of these personalized policies in the testing data is evaluated by the mean response rate under the policies estimated via inverse probability treatment weighted method averaged over all test time points. The mean response rate according to PPL would have been 23%, which compares favorably to other studies in light of the fact that all testing points are at least 16 weeks from first download. It has been reported that user engagement is in the range of 3% to 15% in the third month after download (helander2014). As a reference point, the observed response rate in the testing data is 11%. In addition, we analyzed the prompt response data using the other methods with the same 80%-20% split of training and testing data, and obtained the mean response rate 14%, 17%, 14%, and 8% respectively for glmer, GEE, MGLM, and rPQL.

6 Discussion

This article makes several contributions. First, we have shown personalized policies lead to higher value than non-personalized policy (i.e., GEE) in our simulation study, and have clearly demonstrated substantial heterogeneity of the action effects in the prompt response data. These results imply a paradigm shift and call for the necessity of personalized policies, which fundamentally differ from a single policy that may allow personalized decisions by tailoring. Second, we propose a novel computational algorithm for the estimation of model parameters under GLMM and for developing personalized policies. We have demonstrated, by simulation and in our data application, that the algorithm leads to better estimation properties and decision quality when compared to some existing methods, namely glmer and rPQL. Third, we have provided theoretical justifications of the proposed PPL by examining its asymptotic properties under a fairly general set of assumptions. In particular, we have established consistency and optimality in the presence of endogenous covariate process, where the covariates may depend on previous outcomes, actions, and even the latent random effects. As endogeneity is ubiquitous in longitudinal mobile application usage (how many times a user launched the Hub app would likely depend on how he/she had interacted with the Hub in the past), these theoretical results have broadened the applicability of PPL to many practical situations.

Appendix: Examples of endogenous covariates

In this section, we verify condition f in two examples with endogenous covariates. In the first example, is binary , and the distribution of directly depends on the random effects parameters . In the second example, is Gaussian, and . For simplicity, we verify the condition with (since individuals are i.i.d.), and omit subscript from the notations. In both examples, we consider a scalar mean zero random effects parameter , and the treatment is randomly assigned with for .

Example 1. For binary outcome , suppose

where is the logit link. Conditioning on , , are i.i.d. .

Let be the log-likelihood of Bermoulli distribution with mean . Then satisfies

It is easy to verify that


Finally, conditioning on , are i.i.d., where

. By the uniform law of large numbers theorem,

Example 2. Suppose

where , and is a constant. We consider to be the log-likelihood of Gaussian distribution. Below we show that condition f holds when


Note that condition (9) is a sufficient condition for an AR(1) process to be stationary.

We can verify that

Noticing that