Regression Trees for Longitudinal Data

by   Madan Gopal Kundu, et al.

While studying response trajectory, often the population of interest may be diverse enough to exist distinct subgroups within it and the longitudinal change in response may not be uniform in these subgroups. That is, the timeslope and/or influence of covariates in longitudinal profile may vary among these different subgroups. For example, Raudenbush (2001) used depression as an example to argue that it is incorrect to assume that all the people in a given population would be experiencing either increasing or decreasing levels of depression. In such cases, traditional linear mixed effects model (assuming common parametric form for covariates and time) is not directly applicable for the entire population as a group-averaged trajectory can mask important subgroup differences. Our aim is to identify and characterize longitudinally homogeneous subgroups based on the combination of baseline covariates in the most parsimonious way. This goal can be achieved via constructing regression tree for longitudinal data using baseline covariates as partitioning variables. We have proposed LongCART algorithm to construct regression tree for the longitudinal data. In each node, the proposed LongCART algorithm determines the need for further splitting (i.e. whether parameter(s) of longitudinal profile is influenced by any baseline attributes) via parameter instability tests and thus the decision of further splitting is type-I error controlled. We have obtained the asymptotic results for the proposed instability test and examined finite sample behavior of the whole algorithm through simulation studies. Finally, we have applied the LongCART algorithm to study the longitudinal changes in choline level among HIV patients.



There are no comments yet.


page 1

page 2

page 3

page 4


Regression trees for longitudinal and multiresponse data

Previous algorithms for constructing regression tree models for longitud...

Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables

The identification of sets of co-regulated genes that share a common fun...

Weighted asymmetric least squares regression for longitudinal data using GEE

The well-known generalized estimating equations (GEE) is widely used to ...

Multi-Kink Quantile Regression for Longitudinal Data with Application to the Progesterone Data Analysis

Motivated by investigating the relationship between progesterone and the...

The Effect of Heteroscedasticity on Regression Trees

Regression trees are becoming increasingly popular as omnibus predicting...

On the Construction of Knockoffs in Case-Control Studies

Consider a case-control study in which we have a random sample, construc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In longitudinal studies, repeated measurements of the outcome variable are often collected at irregular and possibly subject-specific time points. Parametric regression methods for analyzing such data have been developed by

Laird and Ware [15] and Liang and Zeger [16] among others, and have been summarized by Diggle et al. [9]. If the population under consideration is diverse and there exists several distinct subgroups within it, the true parameter value(s) for longitudinal mixed effects model may vary between these subgroups. In such cases, the traditional mixed effects models, for example linear mixed effects model, which assume a common parametric form for the covariates and time are not appropriate. For example, Raudenbush [21]

used a longitudinal depression study as an example to argue that it is incorrect to assume that all the people in a given population will be experiencing either increasing or decreasing levels of depression. In such instances, an assumption of a common parametric form will mask important subgroup differences and will lead to erroneous conclusions. In our work, our interest is to identify meaningful and interpretable subgroups with differential longitudinal trajectories and/or differential covariate effects’ on the response variable from such a heterogeneous population. We propose a regression tree construction technique for the longitudinal data using baseline characteristics as partitioning variables, LongCART algorithm, that (1) takes the decision about further splitting at each node controlling type I error, and (2) is applicable in cases when measurements are taken at subject specific time-points.

Figure 1: Sample longitudinal tree. The population consists of 3 subgroups and they differ in their longitudinal profiles. These subgroups are defined by the partitioning variables gender and age.

When the longitudinal profile in a population depends on the baseline covariates, the most common strategy is to include these covariates and their interactions with the time-varying covariate in the model. However, this strategy has some inherent drawbacks such as, over-fitting due to inclusion of all possible interaction terms, requires to specify functional form of the association with baseline covariates, and cannot capture nonlinear effect of baseline covariates. Our goal is to determine the most parsimonious model consisting of a number of homogeneous subgroups from a heterogeneous population profile without strict parametric restrictions or prior information. One of the popular technique to construct homogeneous subgroups is latent class modeling (LCM) [19]. An alternative approach is to construct a regression tree with longitudinal data [22]. Advantages of regression tree technique over LCM are: (1) it characterizes the subgroups in terms of partitioning variables and (2) the number of subgroups need not to be known a-priori. In general, the thrust of any tree technique is the extraction of meaningful subgroups characterized by common covariate values and homogeneous outcome. For longitudinal data, this homogeneity can pertain to the mean and/or covariance structure [22].

Throughout this article, we refer to the regression tree with longitudinal data as ‘longitudinal tree’. Figure 1 displays a toy example of longitudinal tree. This longitudinal tree represents a heterogeneous population with three distinct subgroups in terms of their longitudinal profiles. These subgroups can be characterized by gender and age. Here, gender and age are baseline attributes. In each of the three subgroups, the longitudinal trajectory depends on the covariates , but these subgroups are heterogeneous in terms of the true coefficients associated with their longitudinal profiles. Consider the following form of a linear longitudinal mixed effects model


where is the subject index and , and

denote the outcome variable, time and the vector of measurements of scalar covariates

, respectively. Let include all potential baseline attributes (with cut-off points , respectively) that might influence the longitudinal trajectory in (1.1). The superscript is added to the coefficients and to reflect their possible dependence on these baseline attributes. Let . With such a model, ‘homogeneity’ refers to the situation when the true value of remains same for all the individuals in the entire population, i.e. . When the longitudinal changes in the population of interest are heterogeneous there exists distinct subgroups differing in terms of the true values of the coefficients, i.e. . To model influence of on the longitudinal trajectory of non-parametrically, we have used these baseline attributes as the partitioning variables for construction of longitudinal tree.

In constructing a longitudinal tree through binary partitioning, one way to choose a partition is via maximizing improvement in goodness of fit criterion. For example, Abdolell et al. [1] chose deviance as a goodness of fit criterion. They evaluated deviance at each split of a given partitioning variable and selected the partition with maximum reduction in deviance for the binary splitting. However, this procedure requires a large number of statistical tests. With partitioning variables: , with cut-off points , respectively, the total number of tests is . Clearly, this approach leads to the multiple testing problem. To minimize the problem of multiplicity, we propose the LongCART algorithm for construction of regression tree that involves only single test for each partitioning variable. We call such a test: “test for parameter instability”. Hence, with partitioning variables, we need to perform only tests for parameter instability. The number of tests with the proposed approach is much smaller than in presence of continuous partitioning variables and/or categorical partitioning variables with more than two levels. Consequently, LongCART algorithm puts a better check on the type I error.

blackWe want to construct the regression with a certain level of confidence. In general, controlling type I error rate for the entire tree construction process is very difficult. First, the number of branches is unknown a-priori, and second, there is a large number of possibilities of choosing a split. To address the issue of type I error, at each node, we divide the task of finding the best split at a given node into two sub-tasks: (a) first, we identify if there is a need for further splitting and (b) second, given there is need for further splitting, we choose the optimum splitting point. Our proposed LongCART algorithm controls the type I error while performing the first task. Once this decision is made, the optimum split is chosen without additional testing. In order to get a better intuition for the LongCART algorithm, let us assume there is only a single partitioning variable, say , with cut-off points. In such case, LongCART algorithm identifies the best split at a given node in a two-step process as follows:

  • Step 1. Perform an overall parameter instability test to detect any evidence of heterogeneity of longitudinal model parameters across cut-off points of .

  • Step 2. Given that there is a ‘significant’ evidence for heterogeneity, the split that provides maximal improvement in goodness of fit criterion is chosen as a cut-off point for the tree construction.

blackWe adapt the LongCART algorithm in situations with multiple partitioning variables via repeating the parameter instability test for each partitioning variable controlling type I error at a given level. We continue to the second step using the ‘most significant’ partitioning variable. Details of this algorithm are presented in Section 4. The key idea here is that we are combining the multiple testing procedure (step 1) with model selection (step 2) in order to control the type I error while taking the decision on splitting at each node.

In order to construct a test for parameter instability, we borrow an idea from the time-series literature. In time-series context often the goal is to evaluate whether the parameter of a regression model is stable across different time points. This is often known as a test for structural change or constancy of parameters [e.g., 5, 20, 12]. We apply a similar idea to evaluate whether the true values of the parameter remains the same across the cut-off values of a partitioning variable in a mixed effects longitudinal model of interest.

In this paper we utilize the parameter instability test in multiple ways. First, in the case of continuous partitioning variables, the proposed test uses the results on score process derived by Hjort and Koning [12] in conjunction with the properties of Brownian motion and Brownian Bridge. Second, for categorical partitioning variables with a small number of cut-off points, a test for parameter instability is derived in a straightforward way by employing asymptotic normality of the score functions. We derive the asymptotic properties of the instability test and explore its size and power through an extensive simulation study. Finally, we use these instability tests to construct an algorithm for regression trees with longitudinal data.

Among the tree based methods, classification and regression tree (CART) methods [4]

is probably the most popular one.

Zeileis et al. [25] have extended the concept of CART methodology in the context of fitting cross-sectional regression models. Binary partitioning for longitudinal data has been proposed first by Segal [22]. However, Segal’s implementation is restricted to longitudinal data with a regular structure, that is all the subjects have an equal number of repeated observations at the same time points [27]. Zhang [26] proposes multivariate adaptive splines to analyze longitudinal data. Their method, multivariate adaptive splines for the analysis of longitudinal data (MASAL), can be used to generate regression trees for longitudinal data. Abdolell et al. [1] used deviance as a goodness-of-fit criterion for binary partitioning. They controlled the level of Type I error via permutation test taking into account testing multiplicity. However, permutation tests are computer intensive and the time taken to fit the models is intimidatingly high even for medium-sized data. Sela and Simonoff [23] as well as Galimberti and Montanari [10]

merged the subgroup differences with the random individual differences. They constructed the regression tree through an iterative two-step process. In the first step, they obtained the random effects’ estimates and in the second step, they constructed the regression tree ignoring the longitudinal structure. They repeat these two steps until the estimates of the random effect converge in the first step. blackThe LongCART algorithm provides an improvement over the existing methods in the following aspects: (1) the test for the decision about further splitting at each node is type I error controlled, (2) it is applicable when the measurements are taken at the subject-specific time points, (3) it does not merge the group differences with the random subject effect components and (4) it reduces computational time.

The remainder of this paper is organized as follows. In Section 2 the longitudinal mixed effects models of interest are summarized. Tests for parameter instability for continuous and categorical partitioning variable cases are discussed separately in Section 3. Algorithm for constructing regression trees along with measures of improvement and a pruning technique are discussed in Section 4. Results from the simulation studies examining the performance of the instability test and the regression tree as a whole are reported in Section 5. An application of the longitudinal regression tree method is illustrated on the metabolite data collected from the chronically HIV-infected patients in Section 6.

2 Notation and statistical model

Let be a set of measurements recorded on the subject () at time , where is a continuous scalar outcome; and is the vector of measurements on scalar covariates . We assume that these covariates are linearly associated with . In addition, for each individual, we observe a vector of attributes measured at baseline. We assume that includes all potential baseline attributes that can influence the longitudinal trajectory of and its association with covariates . Further, we do not assume the strict functional form of these baseline attributes’ influence. We use the variables as the candidate partitioning variables to construct a longitudinal regression tree to discover meaningful and interpretable subgroups with differential changes in characterized by the .

When the longitudinal profile is homogeneous in the entire population, we can fit the following traditional linear mixed effects model for all individuals [15]


where and is the vector of random effects pertaining to subject and distributed as . By ‘homogeneity’ we mean that the true value of remains the same for all the individuals in the population. In fact, (2.1) is the simplified version of model in (1.1) under homogeneity.

We follow the common assumptions made in longitudinal modeling that is a subset of ; and are independent; and are independent whenever or or both, and and are independent if . Here, is the fixed effect term and is the standard random effects term. For the subject, we rewrite the Eq. (2.1) as follows


where , is the design matrix consisting of the intercept, time () and covariates (). is the number of observations obtained from the individual. The score function for estimating under (2.2) is [see e.g., 8]

where and

. Further, its variance is

Likelihood estimate of obtained using all the observation from subjects is valid only if the entire population under consideration is homogeneous. If the entire population is not homogeneous in terms of then the likelihood estimate obtained considering all the subjects together are misleading; the extent and direction of ambiguity in the estimate will depend on the nature and proportion of heterogeneity in the sampled individuals. Therefore, under the assumption that are the only attributes that influences the longitudinal profiles of , it is important to decide first whether the true value of remains the same for all the all the subgroups defined by or not. In the next section, we describe statistical tests to assess whether the true value of remains the same across all the values of a given partitioning variable.

3 Test for parameter instability

The purpose of parameter instability test is to test whether the true value of remains the same across all distinct values of baseline attributes (i.e. partitioning variables). Let be any partitioning variable with ordered cut-off points: and be the true value of when . Assume that there are subject with . We denote the cumulative number of subjects with by . That is, and . We want to conduct an omnibus test,

Here, indicates the situation when parameter remains constant (that is, homogeneity) and corresponds to the situation of parameter instability (that is, heterogeneity) . The parameter instability tests utilize the following properties of score function under :

  • A1: ;

  • A2: ;

  • A3: ,

where is the maximum likelihood estimate of and . We discuss the instability test separately depending on whether the partitioning variable is categorical or continuous.

3.1 Instability test with a categorical partitioning variable

It is straightforward to obtain a test for parameter instability using the properties A1–A3 when the partitioning variable, , is categorical with a small number of categories (that is, ). Since the score functions are independent, we have under , the following quantity

is asymptotically distributed as with degrees of freedom where is the dimension of . Here, is the indicator function. The reduction in degrees of freedom is due to the estimation of dimensional from the data.

3.2 Instability test with continuous partitioning variable

Our proposed instability test for continuous partitioning variable is based on score process. We begin by defining the following score process

where . Under , using multivariate version of Donsker’s theorem and Cramér-Wold theorem [see e.g. 2], it can be shown that

where is the zero-mean Gaussian process with . Since is unknown in practice, we replace by in score process

Hjort and Koning [12] has shown that the above estimated score process converges to Brownian Bridge process. We present this result as following theorem and the proof of the theorem is outlined in Appendix.

Theorem 3.1.

Let’s define the standardized estimated score process as

Then under ,

where is a vector with independent standard Brownian Bridges as component processes.

Since the limiting distribution is the vector of independent Brownian Bridge process, individual components of is distributed as a standard Brownian Bridge, . That is,

The above weak convergence continues to hold for any ‘reasonable’ functionals (including supremum) of [see e.g. 7, pp 509, Theorem 1]. Therefore,


has known distribution function [2]

Although this expression involves an infinite series, this series converges very rapidly. Usually a few terms suffice for very high accuracy. This result can be used to formulate a test for instability of parameters at level of significance as follows: (1) Calculate the value of the process for each parameter and obtain the raw p-values. (2) Adjust the p-values according to a chosen multiple testing procedure. (3) Reject if the adjusted p-value for any of the processes, , is less than .

3.3 Instability test for multiple partitioning variables

In practice, we expect to have more than one partitioning variable. Let there be partitioning variables: . In that case we need to perform the instability test for each of the partitioning variables subject to adjustment for multiplicity of type I errors. Let the p-values after multiplicity adjustment be , respectively and . Candidate partitioning variable with the smallest p-value () is chosen as a partitioning variable if is smaller than the nominal significance level. For further discussion please see Section 4.

3.4 Power under the alternative hypothesis

We consider the following form of Pitman’s local alternatives in the vicinity of



is the vector containing degrees of departure from the null hypothesis and

is the vector containing magnitudes of departure. The operation denotes the point-wise multiplication, i.e.,

Theorem 3.2.

Under (3.2), the limiting distribution for the

is a non-central chi-square distribution


Theorem 3.3.

Under (3.2), the limiting distribution for the canonical monitoring process is as follows

where, and

Proofs of these theorems are provided in the Appendix.

4 Longitudinal Regression Tree

4.1 LongCART Algorithm

Smaller p-values from the instability test indicate greater evidence towards instability. Intuitively, splits in the tree should be based on the partitioning variable that shows higher evidence towards instability of the parameters. Therefore, we propose the following algorithm in order to construct a regression tree for longitudinal data.

Step 1. Perform the instability test for each partitioning variable separately at a prespecified level of significance . The level of significance for performing instability test is subject to adjustment for multiple comparisons in order to maintain the level of type I error.

Step 2. Stop if no partitioning variable is significant at level . Otherwise, choose the partitioning variable with the smallest p-value and proceed to step 3.

Step 3. Consider all cut-off points of the chosen partitioning variable. At each cut-off point, calculate the improvement in the goodness of fit criterion (e.g., deviance). With as the chosen partitioning variable, the improvement in goodness of fit criterion can be obtained at the cut-off point in the following steps:

a. Split the data in two parts. One group will include the observations from subjects with and the other group will have the observations from subjects with .

b. Fit the longitudinal model on (i) all the individuals in the node, (ii) the individuals with and (iii) the individuals with . Calculate the goodness of fit criterion from each of these three models. Call them as , and , respectively.

c. Calculate the improvement in goodness of fit criterion as .

Step 4. Choose the cut-off value that provides the maximum improvement in goodness of fit criterion and use this cut-off for binary splitting.

Step 5. Follow the Steps 1-4 for each non-terminal node.

The above strategy for construction of regression tree with longitudinal data has two major advantages over the currently existing algorithms. First, the decision about further splitting at each node is taken controlling type I error. Second, there are huge savings in computation time as we are evaluating the improvement in selected goodness of fit criterion at the cut-off points of the chosen partitioning variable only.

4.2 Improvement

A measure of improvement due to regression tree can be provided in terms of likelihood function based criterion. For example, Akaike Information criterion (AIC) for a tree can be obtained as

where denotes the number of terminal nodes in , is the log-likelihood in th terminal node and is the number of estimated parameters in each node. If we denote the AIC obtained from the traditional linear mixed effects model at root node (that is, common parametric form for covariates and time for the entire population) by , the improvement due to regression tree can be measured as

Since the overall model fitted to all the data is nested within the regression tree based model, a likelihood ratio test or test for deviance can be constructed as well to evaluate the overall significance of a given regression tree.

4.3 Pruning

The improvement in regression tree comes at a cost of adding complexity to the model. If we can summarize complexity of a tree by number of terminal nodes, the cost adjusted AIC of a regression tree can be defined as follows

where be the average complexity for each terminal node. Hence, the tree will be selected if


That is, the tree will be chosen as long as does not exceed some pre-set level of average complexity, ; otherwise, we have to prune the tree to bring below .

5 Simulation

We have explored the performance of instability test for continuous partitioning variables and the performance of proposed LongCART algorithm as a whole through simulation studies.

5.1 Performance of instability test with continuous partitioning variable

Let be continuous partitioning variable with ordered cut-off points as . We first investigated the size of the test and then obtained the size-corrected power.

5.1.1 Size of the test

(a) Percentage of rejection

1.25 0.54 0.56 0.89 1.02 0.95
1.67 0.75 0.85 1.10 1.33 1.29
2.50 1.20 1.46 1.77 2.04 1.94
5.00 2.78 3.35 3.48 4.07 4.19
10.00 5.66 7.14 7.19 8.37 8.53
20.00 13.05 14.73 15.83 16.97 17.14

(b) Observed percentile of

1.25 1.5930 1.4760 1.4938 1.5366 1.5643 1.5504
1.67 1.5472 1.4447 1.4532 1.4891 1.5147 1.4986
2.50 1.4802 1.3722 1.3998 1.4180 1.4392 1.4412
5.00 1.3581 1.2497 1.2924 1.2934 1.3154 1.3287
10.00 1.2238 1.1236 1.1585 1.1629 1.1901 1.1857
20.00 1.0728 0.9859 1.0045 1.0194 1.0350 1.0373
Table 1: Size of proposed parameter instability test for continuous partitioning variable via simulation as discussed in Section 5.1.1. The results were summarized based on simulations for various nominal levels of type I error () and sample size (). The critical values (

) from the true limiting distribution of test statistic

(see Eq. 3.1) is also provided for each . For each and , the simulation results have been summarized by (a) percentage of rejection (to be compared with ) and (b) observed percentile of (to be compared with ). The propose parameter instability test seems to be conservative; however, the size of the test approaches to nominal level with the increase in .

In order to examine the size of the test we have considered a longitudinal model with single mean parameter. We generated observations for subjects at from the following model


with , and . The observations for were generated for each simulation separately from uniform(0,300). For each , Monte-Carlo samples were generated and the test statistic (see Eq. (3.1)) was calculated for each sample separately. The null hypothesis of parameter stability is rejected at % level of significance when exceeds the th percentile of the limiting distribution.

The observed percentiles and the percentage of rejected null hypotheses are summarized in Table 1. We can make following observations: 1) the type I error of test does not exceed the nominal level, 2) the size of the test approaches to the desired significance level with the increase in the sample size , and 3) the test is under-sized for smaller sample sizes. The severe problem with the size of the test for smaller sample size can be explained as follows. Calculation of test statistic, , involves and . However, in practice, the true values of and are unknown and we replace them by their estimates. A consistent estimator (e.g. ML- or REML-based) approaches the true value with an increasing sample size. However, the estimates might be biased for smaller sample sizes. To be precise, for smaller sample size, and may remain underestimated and this leads to smaller value of which in turn results in a smaller size of the test. However, bias in estimation of and fades away with the increase in and this increases the size of the test. We observe this trend in Table 1 as the size of test approaches the nominal level of type I error with the increase in sample size. However, the size of test remains smaller than nominal level even for the reasonably large . The reduced size has been also reported in other tests based on the Brownian Bridge process. For example, Kolmogorov Smirnov test for normality (which also uses the Brownian Bridge as limiting distribution) is conservative [17, 18, 3]. As exceeds 500, the size of the test is close to the nominal level of significance. As a remedy for smaller sample sizes, one might consider using a liberal level or small sample distribution for obtained through simulation.

5.1.2 Power

We generated observations for subjects at from the following model to evaluate performance of instability test for

We set and . , and were generated similarly as before in Section 5.1.1. In this simulation, the parameter is not stable unless . We dealt with two parameters: and , thus we will have two Brownian bridge processes. We adjusted the p-values according to the Hochberg’s step-up procedure [13]. We chose Hochberg’s step-up procedure because it is relatively less conservative than the Bonferroni procedure [14]. However, in principle, any multiple comparison procedure can be applied here.

% of rejection
N instability test
1.4 1.4(1.4) 1.6(1.6) 1.9(1.9) 2.3(2.3) 2.4(2.3)
50 1.6 4.4(4.3) 16.9(16.6) 41.9(42.0) 70.2(70.6) 86.9(87.0)
Overall 2.9 5.6(5.5) 17.9(17.6) 42.6(42.5) 70.5(70.8) 87.0(87.1)
1.5 1.6(1.6) 2.0(2.1) 2.5(2.6) 3.0(3.0) 3.2(3.2)
100 1.7 5.2(5.3) 18.7(19.7) 44.4(46.0) 72.9(73.9) 88.9(89.0)
Overall 3.1 6.6(6.7) 19.8(20.8) 45.0(46.6) 73.1(74.2) 89.0(89.1)
1.8 1.9(1.8) 2.2(2.2) 2.7(2.7) 3.3(3.3) 3.5(3.4)
200 1.9 5.6(5.3) 20.7(19.8) 47.5(46.8) 75.7(75.2) 90.1(89.8)
Overall 3.6 7.4(6.8) 21.9(21.0) 48.2(47.4) 76.0(75.4) 90.6(89.9)
2.1 2.1(2.2) 2.7(2.5) 3.2(3.2) 3.6(3.7) 3.9(4.0)
500 1.8 6.1(6.0) 21.4(20.1) 48.1(48.2) 76.6(76.6) 91.1(91.1)
Overall 3.7 7.8(7.8) 22.8(22.2) 48.8(49.1) 77.0(77.0) 91.3(91.2)
Table 2: Power (%) of parameter instability test with continuous partitioning variable obtained in the simulation described in Section 5.1. Numbers corresponding to and represent the percentages of rejection associated with parameter instability for and , respectively. The ‘Overall’ figures represent the percentage of at least one rejection out of the two.

The results based on 10,000 simulation are displayed in Table 2. As the absolute value of deviates from zero, the power increases. The power of test is close to 80% and approaching the 90% mark as . The sign of does not influence the power of the test. Sizes of the test are very much in agreement with the first simulation study. As observed previously, the test is mildly conservative in the current simulation scenario as the observed level of type I error is consistently slightly below the nominal value .

5.2 Performance of regression tree for longitudinal data

Figure 2: True tree structure for the simulation described in section 5.2. In subgroup, observations were generated according to Eq. (5.2) with specified and .

In this simulation, our goal is to assess the improvement in estimation due to LongCART algorithm when the population under consideration is truly heterogeneous. We have simulated observations for subjects and these subjects come from one of the four different subgroups. Description of these subgroups is displayed in the form of a tree structure in Figure 2. The subgroups can be defined in terms of the partitioning variables , and . In th subgroup (), the values for continuous response variable were generated at according to following model:


where and . As displayed in Figure 2, the true values of were set at , , and and for , the true values were set at , , and , for the four subgroups, respectively. Further, observations were generated for individuals in subgroup 1, individuals in subgroup 2, individuals in subgroup 3, and individuals in subgroup 4. In order to study the performance of our algorithm constructing the longitudinal regression tree, we calculated the mean absolute deviation (MAD) in and in th subgroup for each simulation as defined below

where and are the true values of and in the th subgroup and and are the corresponding estimates for the th individual applyig longitudinal tree and then fitting mixed model in each subgroup. is the set of indices for all individuals in the th subgroup while denotes their number.

Model 2
Model 3 , , ,
Model 4 , , , , , ,
Model 5 , , , , , , ,
Model 6 , , , , , ,
Model 7 , , , , , , , , , ,
, ,
Model 8 , , , , , , , , , , ,
, , ,
Table 3: Description of the mixed models used in section 5.2 for the comparison with LongCART algorithm (Model 1). All models include random intercepts to account for the subject-specific effects.
Subpop 1 Subpop 2 Subpop 3 Subpop 4 Overall
Par. mad0 mad1 mad0 mad1 mad0 mad1 mad0 mad1 mad0 mad1
Model 1 8 0.291 0.083 0.309 0.093 0.315 0.088 0.159 0.033 0.241 0.064
Model 2 2 1.802 0.899 0.802 0.399 0.204 0.101 1.198 0.601 1.107 0.554
Model 3 5 1.439 0.899 0.615 0.399 0.320 0.101 0.893 0.601 0.879 0.554
Model 4 8 1.345 0.899 0.641 0.399 0.277 0.101 0.895 0.601 0.855 0.554
Model 5 9 1.345 0.899 0.628 0.399 0.281 0.101 0.896 0.601 0.854 0.554
Model 6 8 0.404 0.185 0.202 0.061 0.585 0.292 0.432 0.200 0.413 0.189
Model 7 14 0.286 0.070 0.306 0.114 0.300 0.105 0.291 0.074 0.294 0.085
Model 8 16 0.294 1.822 0.309 1.225 0.299 2.181 7.089 2.256 3.242 1.970

mad0= Average , mad1= Average , Par.: No.of parameters.
Model 1: Subgroups are extracted using LongCART algorithm and mixed model with time
slope and random intercept fitted separately in each subgroup.
Model 2 - 8: Description is given in Table 3
- In Model 1, of time regression tree with 4 subgroups were extracted.
Table 4: Summary of the results for the simulation described in section 5.2

The simulation results are summarized in Table 4 based on 1000 simulations in each case. In each simulation, regression tree was constructed with the following specifications: (1) the overall significance level of instability test was set at 5%, (2) minimum node size for further split was set at 40, and (3) minimum terminal node size was set at 20. Recall that we are considering four subgroups in the current simulation. The LongCART algorithm extracted exactly four subgroups in 81% of the cases. blackFive subgroups were extracted in 16% of the cases and in these trees we observed a split in subgroup 4 which was not present in the true tree (see Figure 2). There were only 1.3% instances when 3 subgroups and 1.6% instances when 6 subgroups were extracted.

For the comparison purposes, we considered seven linear mixed models (Model 2 – Model 8). These models are described in Table

3. The application of the LongCART algorithm (Model 1) shows comparatively larger improvements in the estimation of the coefficients in all four subgroups. Both the and were considerably smaller in Model 1 compared to the Models 2—8. The improvement in estimation of coefficients in regression tree was attributed to its ability to extract homogeneous subgroups and then fitting mixed model separately within each group. On the contrary, Models 2—8 assume either additive (Models 2—3) or an interaction (Models 4—8) mixed effects model for the entire population assuming parametric form for both covariates and time. These models do not capture the complexity for the heterogeneous subgroups and overestimate it for the homogeneous subgroups.

Inclusion of the interaction terms in the model does not necessarily take into account subgroup heterogeneity in the presence of continuous partitioning variable. For example, in Models 4 and 5 common slope is assumed for the entire population, but they include interaction terms for the baseline effects; still, the absolute deviation in estimating is almost times higher compared to that of a longitudinal tree. Similarly, Models 6 – 8 include interaction terms for both baseline and longitudinal effects, but again the absolute deviations in estimating and are higher compared to what we have obtained with the longitudinal regression tree.

Model 6 including the interaction terms with and the partitioning variables is probably the most commonly used model in practice. However, the application of the LongCART algorithm offers a considerable improvement in the estimation compared to Model 6. Models 6 – 7 provide some improvement over regression tree in some of the subgroups. However, these improvements are comparatively rare and largely influenced by the fact how the subgroups are defined. We would close this section pointing out, apart from providing improvement in estimation, the LongCART algorithm also identifies the meaningful subgroups defined by the partitioning variables which would remain unidentified otherwise.

6 Application

blackWe applied the LongCART algorithm to study the changes in concentration of a metabolite choline in gray matter region of the brain among HIV patients enrolled in the HIV Neuroimaging Consortium (HIVNC) study [11]. At the time of enrollment, patients were on stable FDA approved antiretroviral therapy. Concentrations of choline were obtained via magnetic resonance spectroscopy (MRS). Choline is considered to be a marker of brain inflammation. It has been found in previous studies that the concentrations of choline were elevated in all three brain regions among HIV patients [6]. We considered a total of observations from subjects. All the observations were within 3 years from baseline. The number of observations per subject ranged from 2 to 6 with median equal to 3. We estimated the overall significant decrease of 0.077 AU per year (p-value=0.003) in choline concentration suggesting overall beneficial effect of antiretroviral therapy.

For the construction of regression tree we used baseline measurements of several clinical and demographic variables including sex, race, education, age, current CD4 count, nadir CD4 count, duration of HIV infection, duration of antiretroviral (ARV) treatment, duration of highly active antiretroviral therapy (HAART), plasma HIV RNA count, antiretroviral CNS penetration-effectiveness (CPE) score and AIDS dementia complex (ADC) stage as partitioning variables. In each node we consider fitting the following model separately

Figure 3: Left panel. Longitudinal regression tree obtained via LongCART algorithm for longitudinal change in choline concentration as discussed in Section 6. The p-value in each node corresponds to the estimate of the slope . Right panel. Estimated linear trajectory for longitudinal change within each subgroup obtained via fitting mixed effect model of the form Eq. (6.1). This regression tree suggests duration of ARV treatment and HAART are significant determinants for longitudinal change of choline.

blackwhere indicates the measurement of the concentration of choline from the th individual at time (in years) and is the subject specific intercept. It was assumed that and

are independently and normally distributed with mean equal to zero. LongCART algorithm was applied with the following specifications: (1) the significance level for individual instability test was set to 5%, (2) the minimum node size for further split was set to 50, and (3) the minimum terminal node size was set to 25.

blackFigure 3 displays the estimated longitudinal regression tree with the estimates of and for each terminal node or subgroup and the plot of estimated linear trajectories within each subgroup. blackDuration of ARV treatment (p-value=0.004) and HAART (p-value=0.004) seem to influence the change in concentration of choline over time. Improvement in deviance due to application of LongCART algorithm was 519 (log-likelihoods were vs. ; with 4 degrees of freedom). ARV treatment for over 7.5 years not only helped to reduce baseline concentration of choline, but also resulted in a significant decrease of 0.094 per year (p-value=0.015). A higher baseline value of choline concentration was observed among those who received ARV treatment for at most 7.5 years; however, a longer period of HAART therapy in them led to significant decrease of 0.196 per year (p-value=0.041) in concentration over time. We did not observe any decrease among those who received ARV treatment for less than 7.5 years and HAART therapy for 2.64 years.

blackIn summary, both the longer duration of ARV treatment and HAART resulted in reduction of choline concentration. However, the rate of reduction is almost double (4.14% vs 2.06%) when patients were on HAART compared to only ARV treatment (see Figure 3). This suggests that both ARV treatment and HAART are effective in controlling brain inflammation via reducing choline concentration. Finally, all these interpretable subgroups along with a significant improvement in overall model fit suggests underlying heterogeneity in the population in terms of longitudinal change in choline concentration. Thus considering a traditional linear mixed effects model for the entire population is not defensible.

7 Discussion

blackThe longitudinal profile in a population may be influenced by several baseline characteristics. This may be true both in observational studies and clinical trials. The most common strategy to incorporate the effect of baseline variables in a traditional linear mixed effects model is to include these baseline characteristics and their interactions with the time-varying variables as covariates in the model. However, this approach has its own limitations as discussed in Section 1. Longitudinal trees, i.e. regression trees for longitudinal data, are extremely useful to identify the heterogeneity in longitudinal trajectories in a given population in a nonparametric way. We have proposed LongCART algorithm for the construction of longitudinal tree which firstly, controls type I error at the time of taking decision about splitting at each node. Secondly, LongCART algorithm reduces the computation time substantially as we first choose the partitioning variable and then evaluate the goodness of fit criterion at all cut-off points of the selected partitioning variable only. Both the instability test and the LongCART algorithm discussed in this paper are based on the score process. Therefore, we can extend the scope of LongCART algorithm to other scenarios as long as we can obtain (or approximate) an expression for the score function and the Hessian matrix in a tractable form including the generalized linear mixed effects model (GLMM) and multiple response variables setting.

Appendix A Proofs

a.1 Proof of Theorem 3.1


Under , by applying Taylor series expansion

where means that tends to zero in probability. In the case of linear mixed effects models, this relationship is exact as the second derivative of the score function is equal to 0. That is, . Consequently,

The limit process is a -dimensional mean zero Brownian Bridge process with covariance function for . Therefore, under

where is a vector with independent standard Brownian Bridges as component processes. ∎

a.2 Proof of Theorem 3.2


Using Taylor series expansion we can write



It can be shown that


Proof of Theorem 3.2 follows from the definition of non-central chi-square distribution. ∎

a.3 Proof of Theorem 3.3


Using (A.1) and (A.2),

This time using the FCLT along with Cramer-Wold device we can show that

Therefore, for ,

Thus under ,


  • Abdolell et al. [2002] Abdolell, M., LeBlanc, M., Stephens, D. and Harrison, R. (2002). Binary partitioning for continuous longitudinal data: categorizing a prognostic variable. Statistics in medicine, 21(22), 3395–3409.
  • Billingsley [2009] Billingsley, P. (2009) Convergence of probability measures, volume 493. Wiley-Interscience.
  • Birnbaum [1952] Birnbaum, Z. (1952). Numerical tabulation of the distribution of Kolmogorov’s statistic for finite sample size. Journal of the American Statistical Association, 47(259), 425–441.
  • Breiman et al. [1984] Breiman, L., Friedman, J., Stone, C. and Olshen, R. (1984) Classification and regression trees, Chapman & Hall/CRC.
  • Brown et al. [1975] Brown, R., Durbin, J. and Evans, J. (1975). Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical Society. Series B, 149–192.
  • Chang et al. [2002] Chang, L. and Ernst, T. and Witt, M. and Ames, N. and Gaiefsky, M. and Miller, E. (2002). Relationships among brain metabolites, cognitive function, and viral loads in antiretroviral-naıve HIV patients. Neuroimage, 17(3), 1638–1648.
  • Csõrgõ [2002] Csõrgõ, M. (2002). A glimpse of the impact of pál erd# ous on probability and statistics. Canadian Journal of Statistics, 30(4), 493–556.
  • Demidenko [2004] Demidenko, E. (2004) Mixed models: theory and applications, volume 518, Wiley-Interscience.
  • Diggle et al. [2002] Diggle, P., Heagerty, P., Liang, K. and Zeger, S. (2002) Analysis of longitudinal data, volume 25, Oxford University Press, USA.
  • Galimberti and Montanari [2002] Galimberti, G. and Montanari, A. (2002). Regression trees for longitudinal data with time-dependent covariates. Classification, clustering and data analysis, 391–398.
  • Gongvatana et al. [2013] Gongvatana, A. and Harezlak, J. and Buchthal, S. and Daar, E. and Schifitto, G. and Campbell, T. and Taylor, M. and Singer, E. and Algers, J. and Zhong, J. and others (2013). Progressive cerebral injury in the setting of chronic HIV infection and antiretroviral therapy. Journal of neurovirology, 19(3), 209–218.
  • Hjort and Koning [2002] Hjort, N. and Koning, A. (2002). Tests for constancy of model parameters over time. Journal of Nonparametric Statistics, 14(1-2), 113–132.
  • Hochberg [1988] Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75(4), 800–802.
  • Hochberg and Tamhane [1987] Hochberg, Y. and Tamhane, A. (1987). Multiple comparison procedures, John Wiley & Sons.
  • Laird and Ware [1982] Laird, N. and Ware, J. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974.
  • Liang and Zeger [1986] Liang, K.Y. and Zeger, S. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
  • Lilliefors [1967] Lilliefors, H. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399–402.
  • Masse [1951] Massey Jr, F. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American statistical Association, 46(253), 68–78.
  • Muthén and Shedden [1999] Muthén, B. and Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55(2), 463–469.
  • Nyblom [1989] Nyblom, J. (1989). Testing for the constancy of parameters over time. Journal of the American Statistical Association, 84(405), 223–230.
  • Raudenbush [2001] Raudenbush, S. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual review of psychology, 52(1), 501–525.
  • Segal [1992] Segal, M. (1992). Tree-structured methods for longitudinal data. Journal of the American Statistical Association, 87(418), 407–418.
  • Sela and Simonoff [2012] Sela, R. and Simonoff, J. (2012). RE-EM trees: a data mining approach for longitudinal and clustered data. Machine learning, 86(2), 169–207.
  • Zeileis et al. [2010] Zeileis, A., Hothorn, T. and Hornik, K. (2010). party with the mob: Model-based Recursive Partitioning in R. Relation, 10(1.15), 4593.
  • Zeileis et al. [2008] Zeileis, A., Hothorn, T. and Hornik, K. (2008). Numerical tabulation of the distribution of Kolmogorov’s statistic for finite sample size. Journal of Computational and Graphical Statistics, 17(2), 492–514.
  • Zhang [1997] Zhang, H. (1997). Multivariate adaptive splines for analysis of longitudinal data. Journal of Computational and Graphical Statistics, 87(418), 74–91.
  • Zhang and Singer [1999] Zhang, H. and Singer, B. (1999). Recursive partitioning in the health sciences, Springer Verlag.