1 Introduction
Intensive longitudinal data are becoming progressively more prevalent across many social science areas, especially in psychology, catalysed by technological advances (e.g., Chapter 1, Bolger Laurenceau, 2013). Such data usually involve many repeated measurements that reflect individualspecific change process in high resolution, enabling researchers to answer deeper research questions of human behavioral patterns. Due to the complex structure of intensive longitudinal data, statistical models play an important role in the analysis of such data.
In an intensive longitudinal study, repeated measurements are made intensively over time. Such data may involve (1) a large number of time points, (2) individuallyvarying numbers of observations, (3) unequally spaced time points of observations, and (4) response data of various types (e.g., continuous, ordinal, etc.). For example, consider intensive longitudinal data from ecological momentary assessment (EMA) under a signalcontingent sampling scheme (see Chapter 5, Conner Lehman, 2012), which repeatedly measures individuals’ current behaviors and experiences in real time, in the individuals’ natural environments. Under this sampling scheme, participants are “beeped” at several (random) times a day to complete an electronic diary record on psychological variables, such as symptoms or wellbeing. The assessments can last for many days (e.g. a month). Such a design has been used to study, for example, borderline personality disorder (Trull ., 2008), adolescent smoking (Hedeker ., 2012), and others. We visualize this design in Figure 1, where the measurements happen at time points marked by “x”. Under such a design, each individual may receive hundreds of repeated measurements at irregularly spaced time points. Depending on the measurement scale, one or multiple indicators may be recorded at each observation time point and the indicators can be either continuous or categorical.
Latent curve models (e.g. Bollen Curran, 2006; Duncan ., 2013; Ram Grimm, 2015), also known as latent growth models or growth curve models, are an important family of psychometric models for the analysis of longitudinal measurements. These models characterize the growth or change in an individual through the modeling of an individualspecific timevarying latent trait, where the latent trait often has a substantive interpretation, such as a cognitive ability, a psychopathological trait, or subjective wellbeing. Such models are typically formulated under the structural equation modeling framework. In these models, each individual is represented by a latent curve , which represents a timevarying latent trait. At a given observation time , the individual’s response to a single or multiple items is assumed to be driven by his/her current latent trait level .
The classical latent curve models are developed for nonintensive longitudinal data (typically less than 10 times of measurement). Therefore, they often make strong assumptions on the functional form of . For example, a linear latent curve model assumes that , where and are the intercept and the slope of the curve, treated as individual specific latent variables. In other words, in this linear curve model, the latent curve is a random function, characterized by two random effects and
that are often assumed to follow a bivariate normal distribution. Although
can take slightly more complex forms (e.g., polynomial), the functional form of in the classical models is usually simple, which may not be suitable for analyzing individual change processes revealed by intensive longitudinal data, where the number of measurements may vary across different individuals.To better capture the temporal pattern in intensive longitudinal data, more flexible latent curve models have been proposed under the structural equation modeling framework. Depending on whether time is treated as discrete or continuous, these models can be classified into two categories. The discretetime models are typically a hybrid of time series analysis models and the structural equation modeling framework. Specifically, the individual specific dynamic latent traits are modeled by a time series model, such as the autoregressive (AR) or vector autoregressive (VAR) models. Such models are usually known as the latent variableautoregressive latent trajectory models
(Bianconcini Bollen, 2018) or dynamic structural equation models (Asparouhov ., 2018). The continuoustime models typically assume that the dynamic latent traits follow a stochastic differential equation (SDE; Oud Jansen, 2000; Voelkle ., 2012; Lu ., 2015). For example, Lu . (2015) assume the dynamic latent trait to follow the OrnsteinUhlenbeck Gaussian process (Uhlenbeck Ornstein, 1930), whose distribution is given by an SDE.The above models have limitations. Discretetime models may be oversimplified for intensive longitudinal data, for which measurement occurs in continuous time. In particular, when time points of measurements are irregularly spaced and different individuals have different numbers of measurements, it is difficult to organize intensive longitudinal data into the format of multivariate timeseries data and then analyze using a discretetime model. Arbitrarily transforming data into a multivariate timeseries format is likely to introduce bias into the analysis, as time lags between measurements, which may vary substantially among individuals, are ignored in the discretetime formatting. In theory, these issues with discretetime models can be addressed by taking a continuoustime model. However, existing continuoustime models are typically not straightforward to specify, estimate, and make inference upon, as latent stochastic differential equations are not straightforward to deal with either analytically or numerically. Moreover, limited by the form of stochastic differential equations, the existing continuoustime models for insensitive longitudinal data may not be rich enough.
In this paper, we propose new continuoustime latent curve models for the analysis of intensive longitudinal data that do not suffer from the issues with the existing models and better capture the unique features of intensive longitudinal data mentioned previously. By imposing Gaussian process models (Rasmussen Williams, 2005) on the latent curves , a general framework for latent curve modeling is developed. We call it the Latent Gaussian Process (LGP) models. In contrast to discretetime models, the proposed models retain the flexibility of continuoustime models in dealing with observations in a continuous time domain. In addition, this general framework contains models that are easier to specify and analyze than SDEbased models.
Technically, the proposed modeling framework can be viewed as a hybrid of the latent Gaussian process model for functional data analysis (Hall ., 2008) and the generalized multilevel structural equation modeling framework for longitudinal measurement (e.g., Chapter 4, Skrondal RabeHesketh, 2004)
. As will be shown in the sequel, many existing latent curve models, whether time is treated as continuous or discrete, can be viewed as special cases under the proposed general framework. By making use of mathematical characterizations of Gaussian processes, methods for the parametrization of LGP models are provided. In addition, parameter estimation and statistical inference are carried out under an empirical Bayes framework, using a Stochastic ExpectationMaximization (StEM) algorithm
(Celeux Diebolt, 1985; Nielsen, 2000; Zhang ., 2018).The rest of the paper is organized as follows. In Section 2, the classical latent curve models are reviewed under a unified framework of structural equation modeling and then a new latent Gaussian process modeling framework is introduced that substantially generalizes the traditional models. The parametrization of latent Gaussian process models is discussed. Estimation and statistical inference are discussed in Section 3, followed by the computational details in Section 4. Extension to the incorporation of covariates is discussed in Section 5. The proposed model is evaluated in Section 6 through simulation studies and further illustrated in Section 7 via a real data example. We end with concluding remarks in Section 8.
2 Latent Gaussian Process Model
2.1 A Unified Framework for Latent Curve Analysis
We first provide a unified framework for latent curve analysis. We consider participants being measured longitudinally within a time interval , where time is treated as continuous. For individual , let be the time that the th measurement occurs and be the total number of measurements received by individual . At each time , we observe a random vector , where can be either continuous or categorical, depending on the data type of the th indicator. In particular, the corresponding latent curve model is called a singleindicator model when and a multipleindicator model when . We denote as a realization of . Moreover, each individual is associated with a latent curve , which can be regarded as a timevarying latent trait. Note that the above setting is quite general that includes discretetime longitudinal data as a special case, for which the observation time takes value in .
The latent curve model consists of two components: (1) a measurement model that specifies the conditional distribution of given , and (2) a structural model that specifies the distribution of the random function .
Measurement model.
The measurement model assumes that the distribution of only depends on , the latent trait level at the same time point, but does not depend on the latent trait levels or responses at any other time points. More precisely, it is assumed that
(1) 
where
denotes the probability density/mass function of the conditional distribution of
, …, given the entire latent process and denotes the probability density/mass function of the conditional distribution of , …, given . Equation (1) means that the latent trait level at any other time point is conditionally independent of the observed responses, given the latent trait levels at the corresponding time points of observation. As visualized in Figure 2, it is further assumed that the conditional distribution (1) has the following decomposition,(2) 
where is the conditional probability density/mass function of given . The assumption in (2) is conceptually similar to the widely used local independence assumption in latent variable models (see Chapter 4, Skrondal RabeHesketh, 2004). Finally, we assume local independence among multiple indicators at each time , i.e., , …, are conditionally independent given . That is
(3) 
where specifies the conditional distribution of the th indicator given . The choice of depends on the type of the th indicator. It is worth noting that the conditional distribution does not depend on time , implying that the measurement is assumed to be timeinvariant. Although commonly adopted in latent curve models (e.g., Chapter 2, Bollen Curran, 2006), this assumption is quite strong and needs to be checked when applying such models to real data.
We provide several measurement model examples.

Linear factor model for continuous response:
(4) where , , and are model parameters.

Probit model for ordinal response ():
(5) where
and are model parameters, and . When ,
degenerates to a binary response variable and the model (
5) becomes the wellknown twoparameter normalogive model in item response theory (Chapter 4, Embretson Reise, 2000).Model (5) can be specified alternatively through the introduction of latent responses. That is, define latent response
where is a noise term following a standard normal distribution. Then the observable response can be viewed as a truncated version of , obtained by
When the multiple indicators contain a mixture of ordinal and continuous variables, the above models can be combined to model , since the measurement models for different items can be specified independently given the local independence assumption.
Structural model.
The structural model specifies the distribution of the random function . We list a few examples below and refer the readers to Bollen Curran (2006) for a comprehensive review.

Linear trajectory model:
(6) where are individual specific random effects, following a bivariate normal distribution.

Quadratic trajectory model:
(7) where are individual specific random effects, following a trivariate normal distribution.

Exponential trajectory model:
(8) where are individual specific random effects, following a bivariate normal distribution and is a fixed effect parameter.
These models assume a simple functional form for . In particular, the realizations of are restricted to linear, quadratic, and exponential functions for models (6)(8), respectively. Such models tend to be effective for nonintensive longitudinal data (typically less than 10 measurements), but may not be flexible enough when having intensive longitudinal measurements which provide information in a high temporal resolution. In the rest of the paper, a general modeling framework is proposed, based on which more flexible structural models can be constructed.
2.2 Gaussian Process Structural Model
In what follows, we introduce a new framework for modeling as a continuoustime stochastic process. A key component of this framework is the Gaussian process model.
Definition 1 (Gaussian Process)
A time continuous stochastic process on time interval is a Gaussian process if and only if for every finite set of time points , is multivariate normal.
We remark that a Gaussian process can be defined more generally on a real line. In this paper, we focus on Gaussian process on a bounded interval , since real longitudinal data are collected within a certain time window. Many widely used stochastic processes, including the Brownian motion, the Brownian bridge, and the OrnsteinUhlenbeck process, are special cases of Gaussian process. Thanks to the flexibility, nonlinearity, and inherent nonparametric structure, Gaussian processes have been widely used as a model for random functions for solving regression, classification, and dimension reduction problems (Chapter 4, Rasmussen Williams, 2005).
Thanks to the normality, a Gaussian process is completely characterized by two components: (1) a mean function , and (2) a kernel function for the covariance structure, where . We provide a definition of a kernel function below.
Definition 2 (Kernel Function)
A bivariate function is called a kernel function if for every finite set of points , the matrix is positive semidefinite.
Note that since , the matrix has to be positive semidefinite, because it is the covariance matrix of . On the other hand, it can be shown that for any kernel function , there exists a Gaussian process whose covariance structure is given by the kernel (Chapter 4, Rasmussen Williams, 2005). As an illustrative example, Figure 3 shows three independent realizations from a Gaussian process, with a mean function and a squared exponential kernel function .
Definition 3 (Gaussian Process Structural Model)
We say the structural component of a latent curve model follows a Gaussian process structural model, if are independent and identically distributed (i.i.d.) Gaussian processes for .
We remark that the Gaussian process structural model assumption in Definition 3
can be viewed as an extension of a commonly adopted assumption in unidimensional or multidimensional item response theory models where individualspecific latent trait or traits are assumed to be i.i.d. univariate or multivariate normal. The difference is that, rather than having a random variable or random vector for each individual, each individual in the proposed model is characterized by a random function, whose distribution is less straightforward to parameterize.
Combining a Gaussian process structural model and a measurement model as defined in Section 2.1, we obtain an LGP model. We point out that the examples (6)(8) are all special cases of the LGP model. This is because, due to the multivariate normality of the random effects, for every finite set of time points , is multivariate normal. In addition, all the SDE based continuoustime latent curve models also fall into this framework, when the noise component of the SDE is assumed to be Gaussian. For example, Lu . (2015) assume the dynamic latent trait to follow the OrnsteinUhlenbeck process (Uhlenbeck Ornstein, 1930). This process is a Gaussian process described by a stochastic differential equation with Gaussian noise. Furthermore, when the latent variables are assumed to be jointly normal, the latent variableautoregressive latent trajectory models (see Bianconcini Bollen, 2018), which are discretetime models, can also be viewed as special cases under the current framework.
A Gaussian process is specified by a mean function and a kernel function , whose choices should be problem specific. We denote the distribution of such a stochastic process by . In what follows, we discuss the parametrization of Gaussian process structural models.
2.3 Parametrization of Gaussian Process Structural Model
Following the above discussion, we see that , where is Gaussian process with mean 0 and kernel . This allows us to discuss the modeling of and separately, while in the classical latent curve models (e.g., (6)(8)) the mean and kernel are modeled simultaneously. In particular, the mean process can be viewed as the mean of , for individuals from a population of interest. Therefore, the mean function captures the mean level of the timevarying latent trait, possibly reflecting the trend and the periodicity of the dynamic latent trait at the population level. In addition, the mean zero Gaussian process can be viewed as the deviation from the mean process that is specific to individual .
Mean function.
We consider the parametrization of the mean function , which is typically assumed to have certain level of smoothness. Specifically, for the linear, the quadratic, and the exponential trajectory models mentioned in Section 2.1, the mean functions take linear, quadratic, and exponential forms.
Under the current framework, can be parameterized more flexibly. Specifically, we adopt a parametrization of using basis functions. That is,
(9) 
where , …, are prespecified basis functions on . For example, when polynomial basis functions are used, , , where is the degree of the polynomial function. When cubic spline basis functions are used, , , , and , where , is the th spline knot that is prespecified on , and when and 0 otherwise. Alternative basis functions may also be used, such as Fourier basis, wavelets, and other spline basis functions. We refer the readers to Chapter 3, Ramsay Silverman (1997) for a review of different basis functions. We remark that the number of basis functions and the choices of basis function may be determined by data through model comparison.
We remark that if the dynamic trait
is assume to be a stationary process (i.e., the joint distribution of
does not change when the process is shifted in time), then the mean function does not depend on time . In that case, the mean function can only have an intercept parameter, .Parameterizing kernel function.
One way to model the mean zero Gaussian process is by directly parameterizing the kernel function. In fact, different parametric kernel functions are available in the literature. We refer the readers to (Chapter 4, Rasmussen Williams, 2005) for a review. In what follows, we provide a few examples of kernel functions, with a focus on kernels that lead to stationary mean zero Gaussian processes. For such a kernel function , the value of only depends on the time lag , not the specific values of and . A stationary kernel should be used if the distribution of is believed to be invariant when the process is shifted in time.

Squared exponential (SE) kernel:
(10) where and are two model parameters, known as the scale and the length scale parameters, respectively.

Exponential kernel:
(11) where and are two model parameters that play similar roles as the ones in the SE kernel above.

Periodic kernel (MacKay, 1998):
(12) where and are two model parameters that play similar roles as the ones in the two kernels above and is known as the period parameter which determines the periodicity of the kernel function.
Mean zero Gaussian processes with different kernel functions have different properties. For example, the mean zero Gaussian processes with an SE kernel tend to have smooth pathes. In fact, a mean zero Gaussian process with the SE kernel is classified as one of the most smooth stochastic processes, according to the notion of mean square differentiability (Chapter 1, Adler, 1981), a classical quantification of the smoothness of stochastic processes. This kernel function is widely used in statistical applications of Gaussian process. It will be further discussed in the sequel and be used in the data analysis.
An alternative way of parameterizing the kernel is by directly modeling the mean zero Gaussian process, which can be done by using a linear basis function model. Specifically, let , …, be prespecified basis functions on
, such as spline basis, Fourier basis, or wavelet basis functions. The theory of functional principal component analysis provides an idea on choosing better basis functions
(e.g., Hall ., 2008). Given the basis functions, the linear basis function model assumes that(13) 
where , , are model parameters and , , are i.i.d. standard normal random variables. The model (13) yields
For finite , this parametrization approach typically leads to a nonstationary kernel function. Making use of the theory of reproducing kernel Hilbert space, essentially any mean zero Gaussian process can be approximated by the form of (13) for sufficiently large .
Squared exponential kernel.
We further discuss on the properties of the SE kernel. According to (10), . The scale parameter thus captures the overall variation of the Gaussian process in the long run. Moreover, the lengthscale parameter captures the shortterm temporal dependence. More precisely, the correlation between and is given by
As shown in Figure 4, for each value of , the correlation decays towards zero as the time lag increases. The decaying rate is determined by the value of . In particular, when the time lag , the correlation is smaller than . Moreover, for a given time lag, a smaller value of implies a smaller correlation. Figure 5 shows sample paths from three Gaussian processes with mean zero and SE kernels. Specifically, in panel (a), , in (b), , and in (c), . Panels (a) and (b) only differ by the values of the parameter and the paths in panel (a) are from a Gaussian process with a smaller value of . The paths in panel (a) are more wiggly (i.e., have more shortterm variation) than those in panel (b), since the Gaussian process in panel (a) has less temporal dependence. Panels (b) and (c) only differ by the values of , due to which the paths in panel (c) have more variation in the long run.
Identifiability of the model parameters
Like many other structural equation models, constraints are needed to ensure model identifiability. In particular, two constraints are needed, one to fix the scale of the latent process and the other to avoid mean shift. For instance, we consider a model combining the mean function (9), the measurement model (4), and the SE kernel (10). To fix the scale in this model, we can either fix the scale parameter in (10) or the first loading parameter in (3). In addition, to avoid mean shift, we can set either in (9) or in (4).
3 Inference under LGP Model
The statistical inference under the proposed model can be classified into two levels, the population level and individual level. Both levels of inference may be of interest in the latent curve analysis. The population level inference considers the estimation of the parameters in both the measurement and structural models. The individual level inference focuses on the posterior distribution of given data from each individual when the measurement and the structural models are known (e.g. obtained from the population level inference).
Population level inference.
We use to denote all the model parameters, including parameters from both the measurement and structural models. As mentioned above, constraints may be imposed on to ensure model identifiability. Our likelihood function can be written as
(14) 
where is the density function of an variate normal distribution with mean and covariance matrix . Note that this likelihood function is the marginal likelihood of data in which the latent curves are integrated out. The maximum likelihood estimator of is defined as , whose computation is discussed in Section 4. We then obtain the estimated mean and kernel functions by plugging in .
Individual level inference.
Similar to the classical latent curve analysis, the current modeling framework also allows for statistical inference on the latent curve of each individual. For ease of exposition, we assume both the measurement and the structural models are known when making individual level inference. In practice, we can first estimate the model parameters and then treat the estimated model as the true one in making the individual level inference. For individual , whether or not measurement occurs at time , one can infer on based on the posterior distribution of given . By sweeping over the entire interval , one obtains the posterior mean of as a function of , which serves as a point estimate of individual ’s latent curve. When calculated under the estimated model, we call the posterior mean of the Expected A Posteriori (EAP) estimate of individual ’s latent curve and denote it by . It mimics the EAP estimate of an individual’s latent trait level in item response theory (e.g. Embretson Reise, 2000).
4 Computation
In this section, we elaborate on the computational details.
4.1 Individual Level Inference
We first discuss computing the posterior distribution of given , for any time , when both the measurement and the structural models are given. We denote the density of this posterior distribution by . Following equation (1) of the measurement model, and are conditionally independent given . Consequently,
(15)  
where denotes the conditional distribution of given and denotes the posterior distribution of given the observed responses. Specifically, since follows a multivariate normal distribution with mean and covariance matrix , is still normal, for which the mean
and variance
have analytic forms. Specifically,where , , , , and . Then the posterior mean of is given by
(16) 
In addition, the
level quantile of the posterior distribution is given by
(17) 
where is the level quantile of a standard normal distribution.
Under the linear factor model (6), are jointly normal. Consequently, (15)(17) have analytical forms. Under other measurement models, (16) and (17) can be approximated by using Monte Carlo samples from the posterior distribution . Specifically, let be Monte Carlo samples. Then we approximate the mean and level quantile of the posterior distribution of by
(18)  
Markov chain Monte Carlo (MCMC) methods can be used to obain Monte Carlo samples from the posterior distribution . For example, a Gibbs sampler is developed that efficiently samples from this posterior distribution under the probit model (5) for ordinal response data. This sampler, described as follows, makes use of the latent response formulation of the probit model (5).

Step 1: For , , sample from a truncated normal distribution that truncates a normal distribution by interval , where is some initial value of .

Step 2: For , given s, we update , by sampling from
where and denotes the conditional distribution of given the ideal responses . It is worth noting that this conditional distribution is multivariate normal, because , …, , , …, are jointly normal. The observed data , …, are not conditioned upon, because , …, are conditionally independent of the observed data when given the latent responses , …, .
We point out that both steps can be efficiently computed, because step 1 only involves sampling from univariate truncated normal distributions and step 2 only involves sampling from multivariate normal distributions. Welldeveloped samplers exist for both steps.
4.2 Population Level Inference
We now discuss the computation for maximizing the likelihood function (14). Under the linear factor model (6), the ExpectationMaximization (EM) algorithm (Dempster ., 1977) is used to optimize (14), where the Estep is in a closed form due to the joint normality of data and latent variables. The implementation of this EM algorithm is standard and thus we omit the details here.
Under other measurement models, the classical EM algorithm is typically computationally infeasible when the number of time points is large, in which case the Estep of the algorithm involves a highdimensional integral that does not have an analytical form. We adopt a stochastic EM (StEM) algorithm (Celeux Diebolt, 1985; Diebolt Ip, 1996; Zhang ., 2018) which avoids the numerical integration in the Estep of the standard EM algorithm (Dempster ., 1977; Bock Aitkin, 1981) by Monte Carlo simulations. The convergence properties of the StEM algorithm are established in Nielsen (2000). Similar to the EM algorithm, the StEM algorithm iterates between two steps, the StE step and the M step. Let be the initial parameter values and be the initial values of person parameters. In each step (), the following StE step and M step are performed.
The final estimate of is given by the average of s from the last iterations, i.e.,
(21) 
As shown in Nielsen (2000), can approximate the maximum likelihood estimator sufficiently accurately, when and are large enough.
5 Incorporation of Covariates
In practice, individual specific covariates are often collected and incorporated into the latent curve analysis. As visualized in the path diagram in Figure 6, covariates can be further added to the structural model to explain how the distribution of the latent curves depends on the covariates. A specific type of covariates of interest is group membership, such as experimental versus control and female versus male. Latent curve analysis that incorporates discrete group membership as covariates in the structural model is referred to as the analysis of groups (Chapter 6, Bollen Curran, 2006).
Covariates can be easily handled under the proposed framework. For example, when discrete group membership may affect the mean function of the latent curve, we let parameters in to be groupspecific. Similarly, we may also allow parameters in to depend on the group membership. Quantitative covariates, such as age, can also be incorporated into the current model. The mean and kernel functions are denoted by and when they depend on the covariates. The tools for the inference and computation discussed above can be easily generalized.
6 Simulation
The proposed modeling framework and the estimation procedures are further evaluated by simulation studies.
6.1 Study I
We first evaluate the parameter recovery using the EM algorithm, under a setting similar to the real data example in Section 7, except that a single group is considered in this study. In particular, it is assumed that each participant is measured for 25 consecutive days, with four measurements per day. Such a design results in 100 times of measurement. The time points of the four measurements are randomly sampled within a day. And we consider a measurement model with a single indicator. More precisely, given the observation time, the model is specified as follows.
where and . The true model parameters are specified in Table 1. Two sample sizes are considered, including and . The simulation under each sample size is repeated for 100 times, based on which the mean squared error (MSE) for parameter estimation is calculated. According to the MSE for parameter estimation presented Table 1, the parameter estimation is very accurate under the current simulation settings and the estimation accuracy improves as the sample size increases.
We further illustrate the performance of the individual level inference based on the distance between and its EAP estimate , where the distance is defined as
In particular, quantifies the inaccuracy of estimating the latent curve by . The distance between and ,
is used as a reference for that quantifies the inaccuracy of estimating by the estimate of the population mean . The ratio serves as a measure of inaccuracy in estimating the latent curve of individual , in which the difficulty in estimating the curve has been taken into account by the denominator . The smaller the ratio is, the more accurate the latent curve is estimated in a relative sense (relative to the overall difficulty in estimating measured by ).
Comments
There are no comments yet.