# A Latent Gaussian Process Model for Analyzing Intensive Longitudinal Data

Intensive longitudinal studies are becoming progressively more prevalent across many social science areas, especially in psychology. New technologies like smart-phones, fitness trackers, and the Internet of Things make it much easier than in the past for data collection in intensive longitudinal studies, providing an opportunity to look deep into the underlying characteristics of individuals under a high temporal resolution. In this paper, we introduce a new modeling framework for latent curve analysis that is more suitable for the analysis of intensive longitudinal data than existing latent curve models. Specifically, through the modeling of an individual-specific continuous-time latent process, some unique features of intensive longitudinal data are better captured, including intensive measurements in time and unequally spaced time points of observations. Technically, the continuous-time latent process is modeled by a Gaussian process model. This model can be regarded as a semi-parametric extension of the classical latent curve models and falls under the framework of structural equation modeling. Procedures for parameter estimation and statistical inference are provided under an empirical Bayes framework and evaluated by simulation studies. We illustrate the use of the proposed model through the analysis of an ecological momentary assessment dataset.

## Authors

• 16 publications
• 6 publications
• ### The Dynamical Gaussian Process Latent Variable Model in the Longitudinal Scenario

The Dynamical Gaussian Process Latent Variable Models provide an elegant...
09/25/2019 ∙ by Thanh Le, et al. ∙ 10

• ### Clustering of longitudinal data: A tutorial on a variety of approaches

During the past two decades, methods for identifying groups with differe...
11/10/2021 ∙ by Niek Den Teuling, et al. ∙ 0

• ### Longitudinal Mediation Analysis with Latent Growth Curves

The paper considers mediation analysis with longitudinal data under late...
03/09/2021 ∙ by Adam J. Sullivan, et al. ∙ 0

• ### Maximum approximate likelihood estimation of general continuous-time state-space models

Continuous-time state-space models (SSMs) are flexible tools for analysi...
10/28/2020 ∙ by Sina Mews, et al. ∙ 0

• ### Forecasting intra-individual changes of affective states taking into account inter-individual differences using intensive longitudinal data from a university student drop out s

The longitudinal process that leads to university student drop out in ST...
04/06/2021 ∙ by Augustin Kelava, et al. ∙ 0

• ### The Autoregressive Structural Model for analyzing longitudinal health data of an aging population in China

We seek to elucidate the impact of social activity, physical activity an...
12/05/2019 ∙ by Yazhuo Deng, et al. ∙ 0

• ### Extending Latent Basis Growth Model to Explore Joint Development in the Framework of Individual Measurement Occasions

Longitudinal processes in multiple domains are often theorized to be non...
07/05/2021 ∙ by Jin Liu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Intensive longitudinal data are becoming progressively more prevalent across many social science areas, especially in psychology, catalysed by technological advances (e.g., Chapter 1, Bolger  Laurenceau, 2013). Such data usually involve many repeated measurements that reflect individual-specific change process in high resolution, enabling researchers to answer deeper research questions of human behavioral patterns. Due to the complex structure of intensive longitudinal data, statistical models play an important role in the analysis of such data.

In an intensive longitudinal study, repeated measurements are made intensively over time. Such data may involve (1) a large number of time points, (2) individually-varying numbers of observations, (3) unequally spaced time points of observations, and (4) response data of various types (e.g., continuous, ordinal, etc.). For example, consider intensive longitudinal data from ecological momentary assessment (EMA) under a signal-contingent sampling scheme (see Chapter 5, Conner  Lehman, 2012), which repeatedly measures individuals’ current behaviors and experiences in real time, in the individuals’ natural environments. Under this sampling scheme, participants are “beeped” at several (random) times a day to complete an electronic diary record on psychological variables, such as symptoms or well-being. The assessments can last for many days (e.g. a month). Such a design has been used to study, for example, borderline personality disorder (Trull ., 2008), adolescent smoking (Hedeker ., 2012), and others. We visualize this design in Figure 1, where the measurements happen at time points marked by “x”. Under such a design, each individual may receive hundreds of repeated measurements at irregularly spaced time points. Depending on the measurement scale, one or multiple indicators may be recorded at each observation time point and the indicators can be either continuous or categorical.

Latent curve models (e.g. Bollen  Curran, 2006; Duncan ., 2013; Ram  Grimm, 2015), also known as latent growth models or growth curve models, are an important family of psychometric models for the analysis of longitudinal measurements. These models characterize the growth or change in an individual through the modeling of an individual-specific time-varying latent trait, where the latent trait often has a substantive interpretation, such as a cognitive ability, a psychopathological trait, or subjective well-being. Such models are typically formulated under the structural equation modeling framework. In these models, each individual is represented by a latent curve , which represents a time-varying latent trait. At a given observation time , the individual’s response to a single or multiple items is assumed to be driven by his/her current latent trait level .

The classical latent curve models are developed for non-intensive longitudinal data (typically less than 10 times of measurement). Therefore, they often make strong assumptions on the functional form of . For example, a linear latent curve model assumes that , where and are the intercept and the slope of the curve, treated as individual specific latent variables. In other words, in this linear curve model, the latent curve is a random function, characterized by two random effects and

that are often assumed to follow a bivariate normal distribution. Although

can take slightly more complex forms (e.g., polynomial), the functional form of in the classical models is usually simple, which may not be suitable for analyzing individual change processes revealed by intensive longitudinal data, where the number of measurements may vary across different individuals.

To better capture the temporal pattern in intensive longitudinal data, more flexible latent curve models have been proposed under the structural equation modeling framework. Depending on whether time is treated as discrete or continuous, these models can be classified into two categories. The discrete-time models are typically a hybrid of time series analysis models and the structural equation modeling framework. Specifically, the individual specific dynamic latent traits are modeled by a time series model, such as the autoregressive (AR) or vector autoregressive (VAR) models. Such models are usually known as the latent variable-autoregressive latent trajectory models

(Bianconcini  Bollen, 2018) or dynamic structural equation models (Asparouhov ., 2018). The continuous-time models typically assume that the dynamic latent traits follow a stochastic differential equation (SDE; Oud  Jansen, 2000; Voelkle ., 2012; Lu ., 2015). For example, Lu . (2015) assume the dynamic latent trait to follow the Ornstein-Uhlenbeck Gaussian process (Uhlenbeck  Ornstein, 1930), whose distribution is given by an SDE.

The above models have limitations. Discrete-time models may be over-simplified for intensive longitudinal data, for which measurement occurs in continuous time. In particular, when time points of measurements are irregularly spaced and different individuals have different numbers of measurements, it is difficult to organize intensive longitudinal data into the format of multivariate time-series data and then analyze using a discrete-time model. Arbitrarily transforming data into a multivariate time-series format is likely to introduce bias into the analysis, as time lags between measurements, which may vary substantially among individuals, are ignored in the discrete-time formatting. In theory, these issues with discrete-time models can be addressed by taking a continuous-time model. However, existing continuous-time models are typically not straightforward to specify, estimate, and make inference upon, as latent stochastic differential equations are not straightforward to deal with either analytically or numerically. Moreover, limited by the form of stochastic differential equations, the existing continuous-time models for insensitive longitudinal data may not be rich enough.

In this paper, we propose new continuous-time latent curve models for the analysis of intensive longitudinal data that do not suffer from the issues with the existing models and better capture the unique features of intensive longitudinal data mentioned previously. By imposing Gaussian process models (Rasmussen  Williams, 2005) on the latent curves , a general framework for latent curve modeling is developed. We call it the Latent Gaussian Process (LGP) models. In contrast to discrete-time models, the proposed models retain the flexibility of continuous-time models in dealing with observations in a continuous time domain. In addition, this general framework contains models that are easier to specify and analyze than SDE-based models.

Technically, the proposed modeling framework can be viewed as a hybrid of the latent Gaussian process model for functional data analysis (Hall ., 2008) and the generalized multilevel structural equation modeling framework for longitudinal measurement (e.g., Chapter 4, Skrondal  Rabe-Hesketh, 2004)

. As will be shown in the sequel, many existing latent curve models, whether time is treated as continuous or discrete, can be viewed as special cases under the proposed general framework. By making use of mathematical characterizations of Gaussian processes, methods for the parametrization of LGP models are provided. In addition, parameter estimation and statistical inference are carried out under an empirical Bayes framework, using a Stochastic Expectation-Maximization (StEM) algorithm

(Celeux  Diebolt, 1985; Nielsen, 2000; Zhang ., 2018).

The rest of the paper is organized as follows. In Section 2, the classical latent curve models are reviewed under a unified framework of structural equation modeling and then a new latent Gaussian process modeling framework is introduced that substantially generalizes the traditional models. The parametrization of latent Gaussian process models is discussed. Estimation and statistical inference are discussed in Section 3, followed by the computational details in Section 4. Extension to the incorporation of covariates is discussed in Section 5. The proposed model is evaluated in Section 6 through simulation studies and further illustrated in Section 7 via a real data example. We end with concluding remarks in Section 8.

## 2 Latent Gaussian Process Model

### 2.1 A Unified Framework for Latent Curve Analysis

We first provide a unified framework for latent curve analysis. We consider participants being measured longitudinally within a time interval , where time is treated as continuous. For individual , let be the time that the th measurement occurs and be the total number of measurements received by individual . At each time , we observe a random vector , where can be either continuous or categorical, depending on the data type of the th indicator. In particular, the corresponding latent curve model is called a single-indicator model when and a multiple-indicator model when . We denote as a realization of . Moreover, each individual is associated with a latent curve , which can be regarded as a time-varying latent trait. Note that the above setting is quite general that includes discrete-time longitudinal data as a special case, for which the observation time takes value in .

The latent curve model consists of two components: (1) a measurement model that specifies the conditional distribution of given , and (2) a structural model that specifies the distribution of the random function .

#### Measurement model.

The measurement model assumes that the distribution of only depends on , the latent trait level at the same time point, but does not depend on the latent trait levels or responses at any other time points. More precisely, it is assumed that

 f(yi(ti1),...,yi(tiSi)|θi(t),t∈[0,T])=f(yi(ti1),...,yi(tiSi)|θi(ti1),...,θi(tiSi)), (1)

where

denotes the probability density/mass function of the conditional distribution of

, …, given the entire latent process and denotes the probability density/mass function of the conditional distribution of , …, given . Equation (1) means that the latent trait level at any other time point is conditionally independent of the observed responses, given the latent trait levels at the corresponding time points of observation. As visualized in Figure 2, it is further assumed that the conditional distribution (1) has the following decomposition,

 f(yi(ti1),...,yi(tiSi)|θi(ti1),...,θi(tiSi))=Si∏s=1g(yi(tis)|θi(tis)), (2)

where is the conditional probability density/mass function of given . The assumption in (2) is conceptually similar to the widely used local independence assumption in latent variable models (see Chapter 4, Skrondal  Rabe-Hesketh, 2004). Finally, we assume local independence among multiple indicators at each time , i.e., , …, are conditionally independent given . That is

 g(yi(t)|θi(t))=J∏j=1gj(yij(t)|θi(t)), (3)

where specifies the conditional distribution of the th indicator given . The choice of depends on the type of the th indicator. It is worth noting that the conditional distribution does not depend on time , implying that the measurement is assumed to be time-invariant. Although commonly adopted in latent curve models (e.g., Chapter 2, Bollen  Curran, 2006), this assumption is quite strong and needs to be checked when applying such models to real data.

We provide several measurement model examples.

1. Linear factor model for continuous response:

 Yij(t)|θi(t)∼N(ajθi(t)+bj,σ2j), (4)

where , , and are model parameters.

2. Probit model for ordinal response ():

 P(Yij(t)=l|θi(t))=Φ(bj,l+1+ajθi(t))−Φ(bj,l+ajθi(t)), (5)

where

 −∞=bj,0

and are model parameters, and . When ,

degenerates to a binary response variable and the model (

5) becomes the well-known two-parameter normal-ogive model in item response theory (Chapter 4, Embretson  Reise, 2000).

Model (5) can be specified alternatively through the introduction of latent responses. That is, define latent response

 Y∗ij(t)=−ajθi(t)+ϵij(t)

where is a noise term following a standard normal distribution. Then the observable response can be viewed as a truncated version of , obtained by

 Yij(t)=l if bj,l≤Y∗ij(t)

When the multiple indicators contain a mixture of ordinal and continuous variables, the above models can be combined to model , since the measurement models for different items can be specified independently given the local independence assumption.

#### Structural model.

The structural model specifies the distribution of the random function . We list a few examples below and refer the readers to Bollen  Curran (2006) for a comprehensive review.

1. Linear trajectory model:

 θi(t)=βi0+βi1t, (6)

where are individual specific random effects, following a bivariate normal distribution.

 θi(t)=βi0+βi1t+βi2t2, (7)

where are individual specific random effects, following a trivariate normal distribution.

3. Exponential trajectory model:

 θi(t)=βi0+βi1exp(γt), (8)

where are individual specific random effects, following a bivariate normal distribution and is a fixed effect parameter.

These models assume a simple functional form for . In particular, the realizations of are restricted to linear, quadratic, and exponential functions for models (6)-(8), respectively. Such models tend to be effective for non-intensive longitudinal data (typically less than 10 measurements), but may not be flexible enough when having intensive longitudinal measurements which provide information in a high temporal resolution. In the rest of the paper, a general modeling framework is proposed, based on which more flexible structural models can be constructed.

### 2.2 Gaussian Process Structural Model

In what follows, we introduce a new framework for modeling as a continuous-time stochastic process. A key component of this framework is the Gaussian process model.

###### Definition 1 (Gaussian Process)

A time continuous stochastic process on time interval is a Gaussian process if and only if for every finite set of time points , is multivariate normal.

We remark that a Gaussian process can be defined more generally on a real line. In this paper, we focus on Gaussian process on a bounded interval , since real longitudinal data are collected within a certain time window. Many widely used stochastic processes, including the Brownian motion, the Brownian bridge, and the Ornstein-Uhlenbeck process, are special cases of Gaussian process. Thanks to the flexibility, nonlinearity, and inherent nonparametric structure, Gaussian processes have been widely used as a model for random functions for solving regression, classification, and dimension reduction problems (Chapter 4, Rasmussen  Williams, 2005).

Thanks to the normality, a Gaussian process is completely characterized by two components: (1) a mean function , and (2) a kernel function for the covariance structure, where . We provide a definition of a kernel function below.

###### Definition 2 (Kernel Function)

A bivariate function is called a kernel function if for every finite set of points , the matrix is positive semidefinite.

Note that since , the matrix has to be positive semidefinite, because it is the covariance matrix of . On the other hand, it can be shown that for any kernel function , there exists a Gaussian process whose covariance structure is given by the kernel (Chapter 4, Rasmussen  Williams, 2005). As an illustrative example, Figure 3 shows three independent realizations from a Gaussian process, with a mean function and a squared exponential kernel function .

###### Definition 3 (Gaussian Process Structural Model)

We say the structural component of a latent curve model follows a Gaussian process structural model, if are independent and identically distributed (i.i.d.) Gaussian processes for .

We remark that the Gaussian process structural model assumption in Definition 3

can be viewed as an extension of a commonly adopted assumption in unidimensional or multidimensional item response theory models where individual-specific latent trait or traits are assumed to be i.i.d. univariate or multivariate normal. The difference is that, rather than having a random variable or random vector for each individual, each individual in the proposed model is characterized by a random function, whose distribution is less straightforward to parameterize.

Combining a Gaussian process structural model and a measurement model as defined in Section 2.1, we obtain an LGP model. We point out that the examples (6)-(8) are all special cases of the LGP model. This is because, due to the multivariate normality of the random effects, for every finite set of time points , is multivariate normal. In addition, all the SDE based continuous-time latent curve models also fall into this framework, when the noise component of the SDE is assumed to be Gaussian. For example, Lu . (2015) assume the dynamic latent trait to follow the Ornstein-Uhlenbeck process (Uhlenbeck  Ornstein, 1930). This process is a Gaussian process described by a stochastic differential equation with Gaussian noise. Furthermore, when the latent variables are assumed to be jointly normal, the latent variable-autoregressive latent trajectory models (see Bianconcini  Bollen, 2018), which are discrete-time models, can also be viewed as special cases under the current framework.

A Gaussian process is specified by a mean function and a kernel function , whose choices should be problem specific. We denote the distribution of such a stochastic process by . In what follows, we discuss the parametrization of Gaussian process structural models.

### 2.3 Parametrization of Gaussian Process Structural Model

Following the above discussion, we see that , where is Gaussian process with mean 0 and kernel . This allows us to discuss the modeling of and separately, while in the classical latent curve models (e.g., (6)-(8)) the mean and kernel are modeled simultaneously. In particular, the mean process can be viewed as the mean of , for individuals from a population of interest. Therefore, the mean function captures the mean level of the time-varying latent trait, possibly reflecting the trend and the periodicity of the dynamic latent trait at the population level. In addition, the mean zero Gaussian process can be viewed as the deviation from the mean process that is specific to individual .

#### Mean function.

We consider the parametrization of the mean function , which is typically assumed to have certain level of smoothness. Specifically, for the linear, the quadratic, and the exponential trajectory models mentioned in Section 2.1, the mean functions take linear, quadratic, and exponential forms.

Under the current framework, can be parameterized more flexibly. Specifically, we adopt a parametrization of using basis functions. That is,

 m(t)=α0+α1b1(t)+⋯+αDbD(t), (9)

where , …, are pre-specified basis functions on . For example, when polynomial basis functions are used, , , where is the degree of the polynomial function. When cubic spline basis functions are used, , , , and , where , is the th spline knot that is pre-specified on , and when and 0 otherwise. Alternative basis functions may also be used, such as Fourier basis, wavelets, and other spline basis functions. We refer the readers to Chapter 3, Ramsay  Silverman (1997) for a review of different basis functions. We remark that the number of basis functions and the choices of basis function may be determined by data through model comparison.

We remark that if the dynamic trait

is assume to be a stationary process (i.e., the joint distribution of

does not change when the process is shifted in time), then the mean function does not depend on time . In that case, the mean function can only have an intercept parameter, .

#### Parameterizing kernel function.

One way to model the mean zero Gaussian process is by directly parameterizing the kernel function. In fact, different parametric kernel functions are available in the literature. We refer the readers to (Chapter 4, Rasmussen  Williams, 2005) for a review. In what follows, we provide a few examples of kernel functions, with a focus on kernels that lead to stationary mean zero Gaussian processes. For such a kernel function , the value of only depends on the time lag , not the specific values of and . A stationary kernel should be used if the distribution of is believed to be invariant when the process is shifted in time.

1. Squared exponential (SE) kernel:

 K(t,t′)=c2exp(−(t−t′)22κ2), (10)

where and are two model parameters, known as the scale and the length scale parameters, respectively.

2. Exponential kernel:

 K(t,t′)=c2exp(−|t−t′|2κ2), (11)

where and are two model parameters that play similar roles as the ones in the SE kernel above.

3. Periodic kernel (MacKay, 1998):

 K(t,t′)=c2exp(−2sin2(π|t−t′|/p)κ2), (12)

where and are two model parameters that play similar roles as the ones in the two kernels above and is known as the period parameter which determines the periodicity of the kernel function.

Mean zero Gaussian processes with different kernel functions have different properties. For example, the mean zero Gaussian processes with an SE kernel tend to have smooth pathes. In fact, a mean zero Gaussian process with the SE kernel is classified as one of the most smooth stochastic processes, according to the notion of mean square differentiability (Chapter 1, Adler, 1981), a classical quantification of the smoothness of stochastic processes. This kernel function is widely used in statistical applications of Gaussian process. It will be further discussed in the sequel and be used in the data analysis.

An alternative way of parameterizing the kernel is by directly modeling the mean zero Gaussian process, which can be done by using a linear basis function model. Specifically, let , …, be pre-specified basis functions on

, such as spline basis, Fourier basis, or wavelet basis functions. The theory of functional principal component analysis provides an idea on choosing better basis functions

(e.g., Hall ., 2008). Given the basis functions, the linear basis function model assumes that

 ¯θi(t)=H∑h=1ωhZihϕh(t), (13)

where , , are model parameters and , , are i.i.d. standard normal random variables. The model (13) yields

 K(t,t′)=H∑h=1ω2hϕh(t)ϕh(t′).

For finite , this parametrization approach typically leads to a non-stationary kernel function. Making use of the theory of reproducing kernel Hilbert space, essentially any mean zero Gaussian process can be approximated by the form of (13) for sufficiently large .

#### Squared exponential kernel.

We further discuss on the properties of the SE kernel. According to (10), . The scale parameter thus captures the overall variation of the Gaussian process in the long run. Moreover, the length-scale parameter captures the short-term temporal dependence. More precisely, the correlation between and is given by

 Cor(θi(t),θi(t′))=Cov(θi(t),θi(t′))√(Var(θi(t))×Var(θi(t′)))=exp(−(t−t′)22κ2).

As shown in Figure 4, for each value of , the correlation decays towards zero as the time lag increases. The decaying rate is determined by the value of . In particular, when the time lag , the correlation is smaller than . Moreover, for a given time lag, a smaller value of implies a smaller correlation. Figure 5 shows sample paths from three Gaussian processes with mean zero and SE kernels. Specifically, in panel (a), , in (b), , and in (c), . Panels (a) and (b) only differ by the values of the parameter and the paths in panel (a) are from a Gaussian process with a smaller value of . The paths in panel (a) are more wiggly (i.e., have more short-term variation) than those in panel (b), since the Gaussian process in panel (a) has less temporal dependence. Panels (b) and (c) only differ by the values of , due to which the paths in panel (c) have more variation in the long run.

#### Identifiability of the model parameters

Like many other structural equation models, constraints are needed to ensure model identifiability. In particular, two constraints are needed, one to fix the scale of the latent process and the other to avoid mean shift. For instance, we consider a model combining the mean function (9), the measurement model (4), and the SE kernel (10). To fix the scale in this model, we can either fix the scale parameter in (10) or the first loading parameter in (3). In addition, to avoid mean shift, we can set either in (9) or in (4).

## 3 Inference under LGP Model

The statistical inference under the proposed model can be classified into two levels, the population level and individual level. Both levels of inference may be of interest in the latent curve analysis. The population level inference considers the estimation of the parameters in both the measurement and structural models. The individual level inference focuses on the posterior distribution of given data from each individual when the measurement and the structural models are known (e.g. obtained from the population level inference).

#### Population level inference.

We use to denote all the model parameters, including parameters from both the measurement and structural models. As mentioned above, constraints may be imposed on to ensure model identifiability. Our likelihood function can be written as

 L(Ψ)=N∏i=1∫Si∏s=1J∏j=1gj(yij(tis)|θis)fi(θi1,...,θiSi)dθi1...dθiSi, (14)

where is the density function of an -variate normal distribution with mean and covariance matrix . Note that this likelihood function is the marginal likelihood of data in which the latent curves are integrated out. The maximum likelihood estimator of is defined as , whose computation is discussed in Section 4. We then obtain the estimated mean and kernel functions by plugging in .

#### Individual level inference.

Similar to the classical latent curve analysis, the current modeling framework also allows for statistical inference on the latent curve of each individual. For ease of exposition, we assume both the measurement and the structural models are known when making individual level inference. In practice, we can first estimate the model parameters and then treat the estimated model as the true one in making the individual level inference. For individual , whether or not measurement occurs at time , one can infer on based on the posterior distribution of given . By sweeping over the entire interval , one obtains the posterior mean of as a function of , which serves as a point estimate of individual ’s latent curve. When calculated under the estimated model, we call the posterior mean of the Expected A Posteriori (EAP) estimate of individual ’s latent curve and denote it by . It mimics the EAP estimate of an individual’s latent trait level in item response theory (e.g. Embretson  Reise, 2000).

## 4 Computation

In this section, we elaborate on the computational details.

### 4.1 Individual Level Inference

We first discuss computing the posterior distribution of given , for any time , when both the measurement and the structural models are given. We denote the density of this posterior distribution by . Following equation (1) of the measurement model, and are conditionally independent given . Consequently,

 h(θ|yi(ti1),...,yi(tiSi)) (15) = ∫h1(θ|θ1,...,θSi)h2(θ1,...,θSi|yi(ti1),...,yi(tiSi))dθ1...dθSi,

where denotes the conditional distribution of given and denotes the posterior distribution of given the observed responses. Specifically, since follows a multivariate normal distribution with mean and covariance matrix , is still normal, for which the mean

and variance

have analytic forms. Specifically,

 μ(θ1,...,θSi)=m(t∗)+Σ12Σ−122(θ−μ)~{}~{}and~{}~{}σ2(θ1,...,θSi)=K(t∗,t∗)−Σ12Σ−122Σ21,

where , , , , and . Then the posterior mean of is given by

 ∫μ(θ1,...,θSi)h2(θ1,...,θSi|yi(ti1),...,yi(tiSi))dθ1...dθSi. (16)

-level quantile of the posterior distribution is given by

 ∫(μ(θ1,...,θSi)+zασ(θ1,...,θSi))h2(θ1,...,θSi|yi(ti1),...,yi(tiSi))dθ1...dθSi, (17)

where is the -level quantile of a standard normal distribution.

Under the linear factor model (6), are jointly normal. Consequently, (15)-(17) have analytical forms. Under other measurement models, (16) and (17) can be approximated by using Monte Carlo samples from the posterior distribution . Specifically, let be Monte Carlo samples. Then we approximate the mean and -level quantile of the posterior distribution of by

 1LL∑l=1μ(θ(l)1,...,θ(l)Si), (18) 1LL∑l=1μ(θ(l)1,...,θ(l)Si)+zασ(θ(l)1,...,θ(l)Si).

Markov chain Monte Carlo (MCMC) methods can be used to obain Monte Carlo samples from the posterior distribution . For example, a Gibbs sampler is developed that efficiently samples from this posterior distribution under the probit model (5) for ordinal response data. This sampler, described as follows, makes use of the latent response formulation of the probit model (5).

• Step 1: For , , sample from a truncated normal distribution that truncates a normal distribution by interval , where is some initial value of .

• Step 2: For , given s, we update , by sampling from

 h3(θ1,...,θSi|y∗i(ti1),...,y∗i(tiSi)),

where and denotes the conditional distribution of given the ideal responses . It is worth noting that this conditional distribution is multivariate normal, because , …, , , …, are jointly normal. The observed data , …, are not conditioned upon, because , …, are conditionally independent of the observed data when given the latent responses , …, .

We point out that both steps can be efficiently computed, because step 1 only involves sampling from univariate truncated normal distributions and step 2 only involves sampling from multivariate normal distributions. Well-developed samplers exist for both steps.

### 4.2 Population Level Inference

We now discuss the computation for maximizing the likelihood function (14). Under the linear factor model (6), the Expectation-Maximization (EM) algorithm (Dempster ., 1977) is used to optimize (14), where the E-step is in a closed form due to the joint normality of data and latent variables. The implementation of this EM algorithm is standard and thus we omit the details here.

Under other measurement models, the classical EM algorithm is typically computationally infeasible when the number of time points is large, in which case the E-step of the algorithm involves a high-dimensional integral that does not have an analytical form. We adopt a stochastic EM (StEM) algorithm (Celeux  Diebolt, 1985; Diebolt  Ip, 1996; Zhang ., 2018) which avoids the numerical integration in the E-step of the standard EM algorithm (Dempster ., 1977; Bock  Aitkin, 1981) by Monte Carlo simulations. The convergence properties of the StEM algorithm are established in Nielsen (2000). Similar to the EM algorithm, the StEM algorithm iterates between two steps, the StE step and the M step. Let be the initial parameter values and be the initial values of person parameters. In each step (), the following StE step and M step are performed.

1. StE step: For , sample from

 h2(θ1,...,θSi|yi(ti1),...,yi(tiSi);Ψ(l−1)),

the conditional distribution of given under parameters . For the probit model (5), we use the Gibbs sampler described in Section 4.1 to sample from .

2. M step: Obtain parameter estimate

 Ψ(l)=\operatornamewithlimitsargmaxΨN∑i=1l(yi(ti1),...,yi(tiSi),~θ(l)i1,⋯,~θ(l)iSi;Ψ), (19)

where

 l(yi(ti1),...,yi(tiSi),~θ(l)i1,⋯,~θ(l)iSi;Ψ) (20) = Si∑s=1[J∑j=1loggj(yij(tis)|~θ(l)is)]+logfi(~θ(l)i1,...,~θ(l)iSi)

is the complete data log-likelihood of a single observation. Note that and are defined in (3) and (14), respectively, containing model parameters. In our implementation, the optimization is done using the L-BFGS-B algorithm (Liu  Nocedal, 1989).

The final estimate of is given by the average of s from the last iterations, i.e.,

 ^Ψ=1mm0+m∑l=m0+1Ψ(l). (21)

As shown in Nielsen (2000), can approximate the maximum likelihood estimator sufficiently accurately, when and are large enough.

## 5 Incorporation of Covariates

In practice, individual specific covariates are often collected and incorporated into the latent curve analysis. As visualized in the path diagram in Figure 6, covariates can be further added to the structural model to explain how the distribution of the latent curves depends on the covariates. A specific type of covariates of interest is group membership, such as experimental versus control and female versus male. Latent curve analysis that incorporates discrete group membership as covariates in the structural model is referred to as the analysis of groups (Chapter 6, Bollen  Curran, 2006).

Covariates can be easily handled under the proposed framework. For example, when discrete group membership may affect the mean function of the latent curve, we let parameters in to be group-specific. Similarly, we may also allow parameters in to depend on the group membership. Quantitative covariates, such as age, can also be incorporated into the current model. The mean and kernel functions are denoted by and when they depend on the covariates. The tools for the inference and computation discussed above can be easily generalized.

## 6 Simulation

The proposed modeling framework and the estimation procedures are further evaluated by simulation studies.

### 6.1 Study I

We first evaluate the parameter recovery using the EM algorithm, under a setting similar to the real data example in Section 7, except that a single group is considered in this study. In particular, it is assumed that each participant is measured for 25 consecutive days, with four measurements per day. Such a design results in 100 times of measurement. The time points of the four measurements are randomly sampled within a day. And we consider a measurement model with a single indicator. More precisely, given the observation time, the model is specified as follows.

 Yi1(t)|θi(t) ∼N(θi(t),σ2), θi(⋅) ∼GP(m,K),

where and . The true model parameters are specified in Table 1. Two sample sizes are considered, including and . The simulation under each sample size is repeated for 100 times, based on which the mean squared error (MSE) for parameter estimation is calculated. According to the MSE for parameter estimation presented Table 1, the parameter estimation is very accurate under the current simulation settings and the estimation accuracy improves as the sample size increases.

We further illustrate the performance of the individual level inference based on the distance between and its EAP estimate , where the distance is defined as

 di=√∫T0(θi(t)−^θi(t))2dt.

In particular, quantifies the inaccuracy of estimating the latent curve by . The distance between and ,

 ei=√∫T0(θi(t)−^α)2dt,

is used as a reference for that quantifies the inaccuracy of estimating by the estimate of the population mean . The ratio serves as a measure of inaccuracy in estimating the latent curve of individual , in which the difficulty in estimating the curve has been taken into account by the denominator . The smaller the ratio is, the more accurate the latent curve is estimated in a relative sense (relative to the overall difficulty in estimating measured by ).

In panel (a) of Figure 7, we show the histogram of the ratios for all individuals from a randomly selected dataset among all replications when the sample size . As we can see, is much smaller than , implying that estimates very accurately. Panels (b)-(d) of Figure 7 show ,