Dementia is a general syndrome characterized by a long term and gradual decrease in the ability to think and remember with consequences on the person’s daily functioning. It represents a pressing public health problem and major research priority. Alzheimer disease (AD) is the most common form of dementia, representing 60% to 80% of the cases (Reitz and Mayeux, 2014). AD gradually affects multiple components long before any clinical diagnosis including brain atrophies, cognitive decline in various functions (memory, language, orientation in space and time, etc.) and loss of autonomy in the daily living activities. There is currently no cure for AD although many novel compounds are under development. Natural history of AD and its progression are still misunderstood; yet their understanding constitutes a crucial step for the assessment of compounds, and more generally AD research. Theoretical schemes have been proposed to understand the disease taken as a whole, highlighting the expected dynamic and multidimensional aspects (Jack et al., 2013). Jack’s scheme, which is now central in AD research, hypothesizes a sequence of alterations: mainly the accumulation of proteins in the brain (amyloid- and tau proteins), the atrophy of brain regions (e.g., hippocampus) and finally clinical manifestations with cognitive and functional declines on which the diagnosis of dementia currently relies. However, because of its complexity, such theoretical scheme is very difficult to translate into a statistical model as it necessitates to combine the dynamic and multidimensional aspects, and to properly explore the temporal relationships between dimensions.
The dynamic aspect of AD has been apprehended mostly using the linear mixed model theory with models that analyzed one dimension according to time and covariates (Hall et al., 2000; Amieva et al., 2008; Donohue et al., 2014). One difficulty is that the dimension of interest is usually not directly observed but measured by a series of outcomes; for instance, cognition is usually measured by a battery of neuropsychological tests. Proust-Lima et al. (2006, 2013) considered latent process models which distinguish a structural model for analyzing the unobserved quantity over time and equations of observations for linking the unobserved quantity to the observed outcomes.
To account for the multidimensional aspect of AD and understand how dimensions are inter-related, many contributions explored pre-determined relationships by examining change over time of one biomarker (e.g., cognitive measure) according to another observed biomarker (e.g., hippocampal initial volume or change) and assumed the latter was observed without measurement error (e.g. Landau et al. (2011); Han et al. (2012)). This approach quantifies temporal relationships but it relies on a specific a priori determined sequence and does not consider all biomarkers as stochastic processes, which limits the interpretation and may induce biases. Some authors also proposed to use bivariate mixed models to dynamically model two dimensions and account for their correlation through correlated random effects (e.g., (Mungas et al., 2005; Robitaille et al., 2012)). However such models remain descriptive; they do not rely on asymmetric temporal relationships between processes and thus do no allow causal interpretations.
Temporal asymmetric relationships between processes have usually been apprehended with Dynamic Bayesian Networks (DBN)(Song et al., 2009). This approach extends the concept of Directed Acyclic Graphs (DAG) (Greenland, 2000) to the dynamic case by modelling constant or time-varying temporal relationships between successive states of a network of processes. However, firstly time is usually too grossly discretized so that spurious temporal associations might appear as showed by Aalen et al. (2016). Second, the trajectories of processes are not modelled over time. Finally, DBN quantify the association between successive levels of processes while a dynamic view of causality would seek local dependence structures linking the network of processes to its infinitesimal change over time (Aalen and Frigessi, 2007).
Local dependence structures can be naturally investigated with mechanistic models which are dynamic models relating a system of processes over time using differential equations. Proposed in HIV studies (Prague et al., 2017) or cancer studies (Desmée et al., 2017), they allowed retrieving causal associations between disease components. Yet mechanistic models are numerically very demanding so that their application to complex diseases such as AD is compromised. In addition they require precise biological knowledge which lacks for AD.
The objective of this work is to propose a statistical model that simultaneously describes the dynamics of multiple dimensions involved in Alzheimer’s disease and assess their temporal relationships similarly as in a mechanistic model but with much less numerical complexity. We consider a system of latent processes, each one representing a latent dimension possibly observed through one or several longitudinal markers. In contrast with mechanistic models, we rely on a discrete-time framework and define a set of difference equations to model the change over time of the system according to the previous state of the system. In addition, we account for individual differences through subject-specific covariates and random effects. As discretization can distort the causal interpretations of temporal relationships compared to a model in continuous time, we specifically evaluate the impact of discretization on the temporal influence structure in a simulation study. The methodology is applied to the Alzheimer’s Disease Neuroimaging Initiative database to explore the temporal structure between cerebral, cognitive and functional dimensions at different stages of AD.
2 Motivating Data
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org. We focused in this work on the ADNI 1 phase which is a longitudinal multisite observational study that included approximately 800 individuals aged between 55-90 enrolled in three different stages of progression to AD: normal aging (CN, N 200), mild cognitive impairment (MCI, N 400), and diagnosed Alzheimer’s Disease (dAD, N 200). The individuals were followed-up every 6 months up to 3 years for CN and MCI groups and up to 2 years for the dAD group. At each visit, a lot of information was collected, including for the present work, volumes of brain regions measured by MRI, a battery of 19 cognitive tests and a functional assessment with the Functional Activities Questionnaire(FAQ) (Pfeffer et al., 1982). Further details on the ADNI 1 study can be found in Mueller et al. (2005).
3.1 Structural model for the system of latent processes
Consider latent processes (with ) representing a system of dimensions (e.g. cerebral anatomy, cognitive ability and functional autonomy dimensions for Alzheimer’s disease) for individual with . We assume is defined at discrete times with , and a constant discretization step. Let us denote the change of the system between two successive times and the rate of change of the system. The model for the trajectories of the processes is split in two parts: (i) the level of the processes at baseline is modelled with a multivariate linear mixed model, and (ii) the rate of change of the system over time is modelled by difference equations combined with a multivariate linear mixed model:
where is the -matrix of covariates associated with the
-vector of fixed effects, and is the -vector of individual random intercepts in the initial system . The -matrix and the -matrix include time-dependent covariates associated with the -vector of fixed effects and the -vector of individual random effects , respectively. is the -matrix of transition intensities.
For each dimension , the vector of individual random effects
where and are unstructured. The variances of the s are constrained to 1 so that the D processes have the same magnitude at baseline. Random effects are assumed independent between dimensions so that the entire -vector of individual random effects has a multivariate normal distribution,
with the identity matrix, the D-block diagonal matrix with block , and the matrix with row where is the -row vector of zeros. In the estimation process, is replaced by its Cholesky decomposition: , where is a lower triangular matrix. Independence between random effects across dimensions is handled by fixing corresponding parameters to zero in , and unit-variances of random intercepts are handled by fixing the corresponding parameters to 1 in . In addition to the variances of random intercepts fixed at one, and without loss of generality, the matrix of covariates excludes intercepts for each process so that the D processes are standardized at baseline. These two constraints are compensated by parameters in the measurement models.
The temporal influences between processes are modelled through the -matrix of time-dependent transition intensities :
This matrix captures the directed temporal influences between latent processes at time and subsequent rates of change of latent processes between times and . Specifically, coefficient quantifies the temporal effect of process at time on process
. Each effect can be modelled according to time/covariates through a linear regressionwhere is a r-vector of time-dependent covariates associated with the r-vector of regression coefficients .
When the discretization step is not too large, the temporal influences intend to have the same causal interpretations as those of a model in continuous time (see Simulation Study 2 for the demonstration).
3.2 Measurement Models of the longitudinal markers
Consider K () continuous longitudinal markers that have been measured for subject at successive discrete times with . Following Proust-Lima et al. (2006, 2013), we assume that each latent process is the underlying common factor of markers () and we note the set of marker subscripts associated with latent process . We assume that a marker measures only one latent process.
The link between a marker and its underlying latent process is defined by a marker-specific measurement model. If marker is Gaussian, the measurement model is a linear equation:
where the vector of transformation parameters is used to get the standardized form of the marker and are independent Gaussian errors with variance .
In the more general case of a continuous marker (possibly non Gaussian), one may consider a nonlinear equation of observation:
where the link transformation comes from a family of monotonic increasing and continuous functions parameterized with . Again is the transformed marker and are independent Gaussian errors with variance . Following Proust-Lima et al. (2013), the link transformation can be defined from a basis of I-splines (which are integrated M-splines (Ramsay, 1988)) in association with positive coefficients, thus providing an increasing bijective flexible transformation. We used here a quadratic I-splines basis with internal knots, , so that
with the vector of parameters of the transformation.
In the following, we denote the diagonal variance matrix of the vector of errors and , the total vector of transformation parameters for the markers. The vector of transformed markers is mapped to the system of latent processes through a matrix with element equal to 1 if marker measures latent process :
In practice the observation process may include intermittent missing observations for a subset of markers or for all the markers at a given occasion , so that markers are actually observed at occasion for subject . We assume that observations are missing at random and note the transformations of the actual -vector of observed markers at occasion .
The model linking the system of processes to the transformed markers can be easily adapted to the presence of intermittent missing data by considering a observation matrix (for ) where element equals 1 if marker is the observed marker at occasion and 0 if not for and :
with the vector of independent Gaussian errors with covariance matrix .
The parameters are estimated in the maximum likelihood framework.
3.3.1 Distribution of the Latent Processes and transformed observations
We first derive the marginal distributions of the latent processes and of the transformed observations. By recurrence, the structural model (1) can be rewritten:
where for and .
By introducing for and so that
Equation (6) can be rewritten
As a result, the vector has a multivariate normal distribution with expectation and variance covariance matrix and the vector has a multivariate normal distribution with expectation and variance-covariance matrix where
It can be easily deduced that the vector of incomplete and transformed data at occasion is multivariate Gaussian with expectation and variance-covariance matrix , and the total vector of incomplete and transformed data is multivariate Gaussian with expectation and variance-covariance matrix , a block matrix with the block.
As the subjects of the sample are independent, the log-likelihood of the model is with the individual contribution to the likelihood. Here, is the whole vector of parameters. Using the Jacobian of the link functions (), the individual contribution is:
where denotes the density function of a multivariate Gaussian vector with expectation and variance-covariance matrix , and denotes the Jacobian of the link function used to transform , the observed marker at occasion for subject . For instance, with a linear link function, .
3.3.3 Optimisation Algorithm and Implementation
The maximum likelihood estimates are obtained using an extended Levenberg-Marquardt algorithm (Marquardt, 1963) because of its robustness and good convergence rate. At each iteration , if necessary, the Hessian matrix is diagonal-inflated to obtain a positive definite matrix which is used to update the parameters , with the gradient at iteration and the improvement control parameter. Convergence is reached when , and , with the total number of parameters. The latter criterion is by far the most stringent one and specifically targets maximum search. The variances of the estimators are obtained from the inverse of .
Given the possibly high number of parameters, we first estimate the parameters for each process taken separately, then we start the maximization of the likelihood of the multivariate model from these simple estimates, setting initial values of the inter-dimension parameters to zero. The model estimation is implemented in R (program available on request); it combines R and languages, and includes parallel computations.
3.4 Marginal and subject-specific predictions
The goodness-of-fit of the model can be assessed by comparing predictions with observations of the markers in their transformed scales. From notations defined in section 3.3.1, marginal and conditional distributions of the markers are:
where is the block-diagonal matrix constituted of blocks.
The marginal () and subject-specific () predictions in the transformed scales are then respectively obtained by taking the expectations of the marginal and conditional distributions of the transformed markers at the parameter estimates and at the predicted latent processes of given the observations : where is the covariance matrix between and .
Using these individual predictions, one can graphically compare either the marginal predictions or subject-specific predictions averaged within time intervals to the observations averaged within the same time intervals. Marginal and subject-specific predictions in the natural scale of the markers can also be derived from the marginal and conditional distributions by using a Monte-Carlo approximation (Proust-Lima et al., 2013).
We performed two series of simulations to evaluate the estimation program and the impact of time discretization on the interpretation of matrix.
4.1 Simulation Study 1: Validation of the estimation program
To evaluate the estimation program, we generated a system of two Gaussian processes ( and ), each one measured by one longitudinal marker ( and ) and two covariates, one binary and one continuous . We considered two scenarios: a covariate-specific temporal influence structure (Scenario 1) and a time-dependent temporal influence structure (Scenario 2). In Scenario 1, we assumed linear trajectories over time for the system of latent processes (random intercepts and simple effects of both covariates in the sub-models for the initial level and the change over time) and a temporal influence structure different in each level of :
where , and with such that the random effects are independent between dimensions, and ), .
In scenario 2, we assumed linear trajectories for the latent processes adjusted for and a transition matrix that evolved with time:
where , and with such that the random effects are independent between dimensions, and ), . Each element of the transition matrix is defined from a basis of quadratic B-splines with one internal knot at the median so that , .
The design of the simulations and the parameters were chosen to mimic the ADNI data. Dimensions 1 and 2 were cerebral anatomy and cognitive ability, respectively measured by a specific composite score. Covariate. Covariate
corresponded to the standardized age at baseline and was generated according to a standard Gaussian distribution. We considered a discretization stepof 6 months. Markers observations were generated every 6 months up to 3 years. Thus a subject had 7 repeated measures at occasions . We also considered a design in which scheduled visits could be missed completely at random with a probability of 0.15, and when a visit was not missed, a marker could be missing with a probability of 0.07. For each design and scenario, we generated 100 samples of 512 subjects.
Table 4 and Table 1 provide the results of the simulations for scenarios 1 and 2 respectively. In both settings, all the parameters were correctly estimated with satisfying coverage rates in the absence of missing data (left part) and in the presence of missing data (right part).
|without missing values (conv=100%)||with missing values (conv=98%)|
Rate of convergence,
(15% missing occasions, 7% missing outcomes),
4.2 Simulation Study 2: Impact of the discretization step on the temporal influence structure between processes
To formally assess whether the interpretation under the discretized time was the same as the one obtained under continuous time, we assessed the type-I error rate associated with each non diagonal element of the transition matrixunder three discretization steps , and when data were actually generated under continuous time (approximated by a step =0.001). We considered for this a system of three latent processes (, , and ), each one measured by one Gaussian repeated marker (, , ). The trajectories were linear, with no adjustment for covariates and a transition matrix constant over time:
The simulation model mimicked the ADNI data with cerebral anatomy, cognitive ability and functional autonomy as dimensions, respectively measured by a specific composite score. We estimated the model on the ADNI data with =1 (6 months) and transformed the parameters to the scale =0.001 to generate the data in almost continuous time. We provide in the Appendix A: the formulas that we used to relate model components defined under =1 and =0.001. Elements of the transition matrix were set one by one to 0 in scale to evaluate the associated type-I error rate. Latent processes were generated with solver from dsolve package (Soetaert et al., 2010) and observations were derived every 6 months up to 3 years. For each design, we considered 200 samples of 512 subjects.
Table 2 displays the type-I error rates (in percentage) associated with each non diagonal element of the transition matrix when estimated with discretization steps =1/3, = and =1. All the type-I error rates are close to the nominal 5% rate (expected rates in the 95% interval [1.98 , 8.02] with 200 replicates). This shows that with such discretization steps, the causal interpretation expected in continuous time is not altered.
The application aimed to describe the decline over time of cerebral anatomy, cognitive ability and functional autonomy in three clinical stages of AD (normal aging (CN), Mild Cognitive Impairment (MCI) and diagnosed with Alzheimer’s Disease (dAD)) and to quantify the temporal influences between these dimensions by assessing especially whether the relationships differed according to the clinical stage.
5.1 The sample
We considered the nineteen available cognitive tests as outcomes of global cognitive ability (Crane et al., 2012), the volume of hippocampus relative to total intracranial volume and the cortical thickness of nine regions (Freesurfer version 4.4.0 for longitudinal data) as outcomes for cerebral anatomy following regions of interest identified by Dickerson et al. (2008), and the sumscore of the FAQ composed of 30 items (Pfeffer et al., 1982) to assess functional autonomy. For numerical simplicity, we considered a unique marker for each dimension by summarizing the cerebral and cognitive markers into two composite scores (Proust-Lima et al., 2017). Covariates were the age at entry (centered around 75.4 and indicated in decades), gender, educational level (low level if 12 years vs high level if 12 years), the APOE genotype (4 carrier vs 4 non-carrier ) and the 3 clinical stages (CN, MCI, dAD). We selected in the sample all the subjects who had no missing data for the covariates and had at least one measure for each dimension during the follow-up. The main analysis included 656 subjects (82% of the initial sample). The sample consisted of 190 (29%) subjects at healthy stage (CN), 322 (49%) subjects at mild cognitive impairment stage (MCI), and 144 (22%) subjects diagnosed with Alzheimer’s disease (dAD); 43% were females, 83% had a high educational level and 49% carried APOE 4 allele. The mean age at entry was 75.4 years old (sd=6.6). The mean number of visits was 5, 6 and 4 for CN, MCI and dAD subjects, respectively.
5.2 The dynamic multivariate model
The dynamic model applied to ADNI is summarized in Figure 1. The three latent processes (cerebral anatomy (), cognitive ability (), functional autonomy ()) were repeatedly measured by the three composite scores , and