Using deep learning for comprehensive, personalized forecasting of Alzheimer's Disease progression

07/10/2018 ∙ by Charles K. Fisher, et al. ∙ 4

A patient is more than one number, yet most approaches to machine learning from electronic health data can only predict a single endpoint. Here, we present an alternative -- using unsupervised deep learning to simulate detailed patient trajectories. We use data comprising 18-month longitudinal trajectories of 42 clinical variables from 1908 patients with Mild Cognitive Impairment (MCI) or Alzheimer's Disease (AD) to train a model for personalized forecasting of disease progression. Our model simulates the evolution of each sub-component of cognitive exams, laboratory tests, and their associations with baseline clinical characteristics, generating both predictions and their confidence intervals. Even though it is not trained to predict changes in disease severity, our unsupervised model predicts changes in total ADAS-Cog scores with the same accuracy as specifically trained supervised models. We show how simulations can be used to interpret our model and demonstrate how to create synthetic control arm data for AD clinical trials. Our model's ability to simultaneously predict dozens of characteristics of a patient at any point in the future is a crucial step forward in computational precision medicine.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 31

page 36

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Two patients with the same disease may present with different symptoms, progress at different rates, and respond differently to the same therapy. Understanding how to predict and manage differences between patients is the primary goal of precision medicine Collins and Varmus (2015). Computational models of disease progression developed using machine learning approaches provide an attractive tool to combat such patient heterogeneity. One day these computational models may be used to guide clinical decisions; however, current applications are limited both by the availability of data and by the ability of algorithms to extract insights from those data.

Most applications of machine learning to electronic health data have used techniques from supervised learning to predict specific endpoints 

Rajkomar et al. (2018); Miotto et al. (2016); Choi et al. (2016); Lasko et al. (2013); Lipton et al. (2015); Myers et al. (2017)

. An alternative to developing separate supervised models to predict each characteristic is to build a single model that simultaneously predicts the evolution of many characteristics. Statistical models based on artificial neural networks (Deep Learning) provide one avenue for developing tools that can simulate patient progression in detail 

Choi et al. (2017); Esteban et al. (2017); Beaulieu-Jones et al. (2017).

Clinical data present a number of challenges that are not easily overcome with current approaches to deep learning Goldstein et al. (2017). For example, most clinical datasets contain multiple types of data (i.e., they are “multimodal”), have a relatively small number of samples, and many missing observations. Dealing with these issues typically requires extensive preprocessing Miotto et al. (2016) or simply discarding variables that are too difficult to model. For example, one recent study focused on only four variables that were frequently measured across all 200,000 patients in an electronic health dataset from an intensive care unit Esteban et al. (2017). Developing methods that can overcome these limitations is a key step towards broader applications of machine learning in precision medicine.

Precision medicine is especially important for complex disorders where patients exhibit different patterns of disease progression and therapeutic responses. Alzheimer’s Disease (AD) and Mild Cognitive Impairment (MCI) are complex neurodegenerative diseases with multiple cognitive and behavioral symptoms Kumar et al. (2015). The severity of these symptoms is usually assessed through exams such as the Alzheimer’s Disease Assessment Scale (ADAS) Rosen et al. (1984) or Mini Mental State Exam (MMSE) Folstein et al. (1975). The heterogeneity of AD and related dementias makes these diseases difficult to diagnose, manage, and treat, leading to calls for better methods to forecast and monitor disease progression and to improve the design of AD clinical trials Cummings et al. (2016).

A variety of disease progression models have been developed for MCI and AD using clinical data Rogers et al. (2012); Ito et al. (2013); Kennedy et al. (2016); Tishchenko et al. (2016); Szalkai et al. (2017) or imaging studies Mueller et al. (2005); Risacher et al. (2009); Hinrichs et al. (2011); Ito et al. (2011); Suk and Shen (2013); Suk et al. (2014); Liu et al. (2014); Ortiz et al. (2016); Samper-Gonzalez et al. (2017). Although previous approaches to forecasting disease progression have proven useful Corrigan et al. (2014); Romero et al. (2015), they have focused on predicting a single endpoint, such as the change in the ADAS Cognitive (ADAS-Cog) score from baseline. Given that AD is heterogeneous and multifactorial, we set out to model the progression of more than just the ADAS-Cog score. We accomplished this by simulating the progression of entire patient profiles, describing the evolution of each sub-component of the ADAS-Cog and MMSE scores, laboratory tests, and their associations with baseline clinical characteristics.

The manuscript is structured as follows. Section II.1 describes our data processing steps and Section II.2 describes our machine learning model. Section III.1 assesses the goodness-of-fit of our machine learning model. Predictions for individual components are discussed in Section III.2. Section III.3 assesses the accuracy of our approach, which simulates each sub-component of the cognitive scores, at predicting changes in overall disease activity measured by the ADAS-Cog exam. Finally, Section IV discusses implications.

Ii Methods

ii.1 Data Processing

Our statistical model was trained and tested on data extracted from the Coalition Against Major Diseases (CAMD) Online Data Repository for AD (CODR-AD) Romero et al. (2009); Neville et al. (2015). The development and composition of this database have been previously described in detail Neville et al. (2015). The CAMD database contains 6500 patients from the placebo arms of 24 clinical trials on MCI and AD. These trials have varying duration, visit frequency, and inclusion criteria; nearly all patients have no data beyond approximately 18 months. We chose a 3-month spacing between time points based on the visit frequency of the bulk of long-lasting patients to ensure that most patients had no gaps in their data. The falloff in patient data after the 18-month time point led us to select that as the final time point. Therefore, patient trajectories are represented by 7 time points (0, 3, 6, 9, 12, 15, and 18 months).

Data in the CAMD database is stored in the CDISC format Kubick et al. (2007); Hume et al. (2016)

. The covariates used in our statistical model of AD progression originate from tables in the database on demographics, disposition events, laboratory results, medical histories, questionnaires, subject characteristics, subject visits, and vital signs. We designated some variables, such as height, as static. Multiple values for any of the static variables were averaged to produce a single estimate. Time-dependent variables were bucketed into 90-day windows centered on each time point. Multiple entries in any window were averaged, or extremal values were taken as appropriate. Any data with units (such as laboratory tests) were converted to a common unit for each test for all patients (e.g., g/L for triglycerides). Results for both the ADAS-Cog and MMSE tests were available for many patients to the level of individual components. Individual question data were available for some patients, which we aggregated into component scores. A final processing step converted data into numerical values more suitable for statistical modeling. Categorical variables were one-hot encoded and positive continuous variables were log-transformed and standardized. All variables were transformed back to canonical form before analysis.

Our statistical model can perform imputation of missing data during training. However, using covariates that are missing in a large fraction of patients would lead to poor performance. Therefore, we chose 44 variables that were observed in a reasonably large fraction of patients. Tables 

1 and 2 describe each of the variables included in our analysis. Because we are interested in modeling AD progression, we focused on patients in the CAMD database with long trajectories. This led us to select the 1908 patients from CAMD that have a valid ADAS-Cog score (i.e., data is not missing for any of the 11 components) for either of the 15-month or 18-month time points.

To summarize, we extracted 18-month longitudinal trajectories of 1908 patients with MCI or AD covering 44 variables including the individual components of the ADAS-Cog and MMSE scores, laboratory tests, and background information. The patients were randomly divided into a training group of 1335 patients, a validation group of 95 patients, and a testing group of 478 patients. The training group was only used in learning parameters of the model; the validation group was only used to monitor metrics during training, and the testing group was only used to evaluate the performance of our model. Each patient profile consisted of 44 covariates (Tables 1 and 2

) that were classified as binary, ordinal, categorical, or continuous. Patient trajectories described the time evolution of all 44 variables in 3-month intervals.

Category Name Type Temporal
ADAS Commands Ordinal Yes
ADAS Comprehension Ordinal Yes
ADAS Construction Ordinal Yes
ADAS Delayed Word Recall Ordinal Yes
ADAS Ideational Ordinal Yes
ADAS Instructions Ordinal Yes
ADAS Naming Ordinal Yes
ADAS Orientation Ordinal Yes
ADAS Spoken Language Ordinal Yes
ADAS Word Finding Ordinal Yes
ADAS Word Recall Ordinal Yes
ADAS Word Recognition Ordinal Yes
MMSE Attention and Calculation Ordinal Yes
MMSE Language Ordinal Yes
MMSE Orientation Ordinal Yes
MMSE Recall Ordinal Yes
MMSE Registration Ordinal Yes
Table 1: Cognitive variables included in the model.
Category Name Type Temporal
Laboratory Alanine aminotransferase   Continuous Yes
Laboratory Alkaline phosphatase Continuous Yes
Laboratory Aspartate aminotransferase Continuous Yes
Laboratory Cholesterol Continuous Yes
Laboratory Creatine kinase Continuous Yes
Laboratory Creatinine Continuous Yes
Laboratory Gamma glutamyl transferase Continuous Yes
Laboratory Hematocrit Continuous Yes
Laboratory Hemoglobin Continuous Yes
Laboratory Hemoglobin a1c Continuous Yes
Laboratory Indirect bilirubin Continuous Yes
Laboratory Potassium Continuous Yes
Laboratory Sodium Continuous Yes
Laboratory Triglycerides Continuous Yes
Clinical Blood pressure (diastolic) Continuous Yes
Clinical Blood pressure (systolic) Continuous Yes
Clinical Heart rate Continuous Yes
Clinical Weight Continuous Yes
Clinical Dropout Continuous Yes
Background Age at baseline Continuous No
Background Geographic region Categorical No
Background Initial diagnosis (AD or MCI) Binary No
Background Past cardiovascular event Binary No
Background ApoE 4 allele count Ordinal No
Background Race Categorical No
Background Sex Binary No
Background Height Continuous No
Table 2: Laboratory, clinical, and background variables included in the model.

ii.2 Machine Learning

Figure 1: Overview of the data and model

. A) Study data built from the CAMD database consists of 18-month longitudinal trajectories of 1908 patients with MCI or AD. Our model uses 44 variables, including the individual components of the ADAS-Cog and MMSE scores, laboratory tests, and background information. B) To capture time dependence, we model the joint distribution of the data at time

and the data at time

using a Conditional Restricted Boltzmann Machine (CRBM) with ReLU hidden units. Multimodal observations are modeled with different types of units in the visible layer and missing observations are automatically imputed.

A statistical model is generative if it can be used to draw new samples from an inferred probability distribution. Generative modeling of clinical data involves two tasks: i) randomly generating patient profiles with the same statistical properties as real patient profiles and ii) simulating the evolution of these patient profiles through time. Each of these tasks is complicated by common properties of clinical data, namely that they are typically multimodal and have many missing observations. Moreover, patient progression is best regarded as a stochastic process and it is important to capture the inherent randomness of the underlying processes in order to make accurate forecasts.

Let

be a vector of covariates measured in patient

at time . Creating a generative model to solve (i) involves finding a probability distribution such that we can randomly draw

. Solving problem (ii) involves finding a conditional probability distribution

so that we can iteratively draw to generate a patient trajectory.

Our statistical model for patient progression is a latent variable model called a Conditional Restricted Boltzmann Machine (CRBM) Ackley et al. (1985); Hinton (2010); Taylor et al. (2007); Mnih et al. (2011). To construct the model, the covariates were divided into two mutually exclusive subsets: static covariates that were determined solely from measurements at the beginning of the study , and dynamic covariates that changed during the study . To train the model, we defined vectors by concatenating neighboring time points with the static covariates. All neighboring time points are combined into a single dataset used to train a single statistical model that applies to all neighboring time points. Rather than directly modeling the correlations between these covariates, a CRBM models these correlations indirectly using a vector of latent variables

. These latent variables can be interpreted in much the same way as directions identified through principal components analysis.

The CRBM is a parametric statistical model where the probability density is defined as

(1)

and is a normalization constant that ensures the total probability integrates to one. Here, and and are functions that characterize the data types of covariate and latent variable , respectively. The parameters and set the scales of and

, respectively. We used 50 normally distributed latent variables that were lower truncated at zero, which is known as a rectified linear (ReLU) activation function in the machine learning literature

Tubiana and Monasson (2017). To deal with missing data, we divide the visible vector into mutually exclusive groups and and impute the missing values by drawing from the conditional distribution .

Traditionally, CRBMs are trained to maximize the likelihood of the data under the model using stochastic maximum likelihood Tieleman (2008)

. Recent results have shown that one can improve on maximum likelihood training of RBMs by adding an additional term to the loss function that measures how easy it is to distinguish patient profiles generated from the statistical model from real patient profiles

Fisher et al. (2018). Therefore, we used a combined maximum likelihood and adversarial training method to fit the CRBM; more details of the machine learning methods are described in the Supporting Information. An overview of our statistical model is depicted in Figure 1.

We generated two types of synthetic patient trajectories with a CRBM: i) synthetic trajectories starting from baseline values for real patients, and ii) entirely synthetic patients. The first type is useful for many tasks in precision medicine and clinical trial simulation, while the second type has interesting applications for maintaining the privacy of clinical data Dankar and El Emam (2013). To generate trajectories of type (i), an initial population of patients was selected and then the model was used to predict their future state. To accomplish this, we started with baseline data and used the CRBM to iteratively add new time points. To generate trajectories of type (ii), entirely synthetic patients were generated by first simulating the baseline data, then iteratively adding new time points so that the patient data was entirely simulated.

Iii Results

Figure 2: The model has good generative performance. A) Correlations between variables as predicted by the model (below the diagonal) and calculated from the data (above the diagonal). Components of the cognitive scores are strongly correlated with each other, but not with other clinical data. B) Scatterplot of observed vs predicted correlations for each time point, over all times. C) Scatterplot of observed vs predicted autocorrelations with time lag of 3 months. D) Scatterplot of observed vs predicted autocorrelations with time lag of 6 months. The color gradient in B – D represents the fraction of observations where the variables used to compute the correlation were present; lighter colors mean more of the data was missing. The values shown are from a least squares fit weighted by this fraction (of data present when computing the correlations).

iii.1 General model performance

As an initial measure of performance, we assessed the ability of the CRBM to generate marginal distributions of each variable using entirely synthetic patients. The CRBM generated time series that accurately captures the marginal distributions of cognitive exam scores, laboratory tests, and clinical data of real AD patients (Supporting Information). Beyond marginal distributions, equal-time and lagged autocorrelations are more important factors in forecasting disease progression, so we assessed the CRBM’s ability to model these data. Entirely synthetic patient trajectories have correlations that model the data well (Figure 2A). The variables can be reasonably grouped into three categories: cognitive scores, laboratory and clinical tests, and background information. There are strong correlations between variables belonging to the same category but only weak inter-category correlations (Figure 2A). Unfortunately, this also implies that neither the laboratory tests nor background variables are strongly correlated with the primary clinical endpoints captured by cognitive assessments for AD. Even though the CRBM only incorporates a direct connection between neighboring time points, this is sufficient to reproduce equal-time and lagged autocorrelations between variables (Figure 2B-D), even for time lags greater than 3 months. Collectively, these results suggest the model has excellent generative performance.

iii.2 Simulating conditional patient trajectories

Figure 3: The model accurately forecasts across variables.

Relative errors of the model (CRBM) and a random forest (RF) specifically trained to predict the value of a single variable at a single time point. The root mean square (RMS) errors are scaled by the standard deviation of the data to be predicted. Predictions are shown for every time-dependent variable except dropout. At each time point and for each variable, the better of the random forest and CRBM predictions is shown in bold.

Predictions for any unobserved characteristics of a patient can be computed from our model by generating samples from the model distribution conditioned on the values of all observed variables. Sampling from the conditional distributions can be used to fill-in any missing observations (i.e., imputation) or to forecast a patient’s future state. The ability to sample from any conditional distribution is one advantage a modeling framework based on CRBMs has over alternative generative models based on directed neural networks.

For each patient in the test set, we computed a forecast for their entire trajectory conditioned on their baseline covariates. That is, we used the CRBM to numerically compute the conditional expected value . Next, we evaluated the root mean square (RMS) error of the CRBM predictions on each variable at each time point past baseline. For comparison, we trained a series of Random Forest (RF) models that use the baseline data to predict each of the 35 time-dependent variables for all 6 time points. Note that there is a separate RF model for each variable at each time point – a total of 210 different RF models. We also trained an ensemble of 6 multivariate RFs – each one predicted all 35 covariates for a given time point – but were unable to get reasonable accuracies (see Supporting Information). The RMS error of the random forest prediction sets a benchmark for a predictive model that is specially trained for an individual problem. By contrast, a single CRBM model is used to predict all variables, and all time points. Figure 3 presents a detailed comparison between the single CRBM and the ensemble of 210 RF models. The accuracy of the CRBM is close to the specialized RF model for each variable and time point, with the CRBM generally outperforming the RF on the components of ADAS-Cog.

One can think of the ensemble of RF models as approximating the predictions of a factorized generative model. That is, one could construct a simpler probabilistic model by assuming that the variables are independent when conditioned on the baseline values, i.e. . While this factorized model can make accurate predictions for individual variables in isolation, it cannot generate realistic trajectories that capture the correlations between the covariates (which will be zero by construction). By contrast, the CRBM achieves equivalent accuracy on individual prediction problems while also correctly modeling the correlation structure. More details on the comparison between RFs and the CRBM are provided in the Supporting Information.

In summary, stochastic simulations of disease progression have two main advantages compared to supervised machine learning models that aim to predict a single, predefined endpoint. The first is the simultaneous modeling of entire patient profiles in a way that correctly captures correlations between covariates. This allows for the quantitative exploration of alternative endpoints and different patient subgroups. The second is that stochastic simulations provide in-depth estimates of risk for individual patients that can be aggregated to estimate risks in larger patient populations. Moreover, our model provides accurate estimates of its uncertainty in addition to forecasts for expected progression of individual patients (Figures S1 and S8). Patient heterogeneity manifests in more complex ways than just a shift in the mean outcome – there are changes in the variance, skew, and shape of the distribution of model predictions for each patient. Personalized approaches to AD therapy will have to predict and address these different types of risk. The ability to simultaneously compute predictions and confidence intervals for multiple characteristics of a patient is a key feature of our approach and an important step towards comprehensive simulations of disease progression.

iii.3 Forecasting and interpreting disease progression

Figure 4: The model accurately forecasts progression and allows for interpretation. A) Box plot of the ADAS-Cog score over time computed from the data and the model. The line shows the mean, and the whiskers show the 10 and 90 percentiles. B) Out-of-sample predictive accuracy for the change in ADAS-Cog score from baseline (i.e.,

) for different study durations. Separate neural network, random forest, and linear regression models were trained to predict the change in ADAS-Cog score from baseline for each study duration. The blue band shows the uncertainty on the CRBM prediction. C) We created a simulated patient population with MCI and an initial ADAS-Cog score of 10, and simulated the evolution of each synthetic patient for 18 months. The 5% of synthetic patients with the largest ADAS-Cog score increase were designated “fast progressors” and the bottom 5% of patients with the smallest ADAS-Cog score increase were designated “slow progressors”. Differences between the fast and slow progressors (the “absolute effect size”) were quantified using the absolute value of Cohen’s

-statistic, which measures the mean difference divided by a pooled standard deviation Cohen (1988).

We now turn to disease progression to evaluate the CRBM and understand how it provides interpretative power for MCI and AD. Our model is trained to simulate the evolution of the individual components of the cognitive exams, laboratory tests, and clinical data. As a result, it is also possible to simulate the evolution of any combination of these variables, such as the 11-component ADAS-Cog score that is commonly used as a measure of overall disease activity. Note that the ADAS delayed word recall component, which is present in the dataset, is not part of the 11-component ADAS-Cog score but can be used as an additional probe of disease severity, especially for MCI Sano et al. (2011). Figure 4A shows a box plot describing the evolution of the ADAS-Cog score distribution within the population. The data and model show the same trend – an increase in the mean ADAS-Cog score with time along with a widening right tail of the distribution. This implies that much of the trend of increasing ADAS-Cog scores in the population is driven by a subset of patients.

Simulations from the model can be run for each individual patient in order to forecast their disease progression. Despite only being trained on data with a 3-month time lag, the model makes accurate predictions out to at least 18 months (Figure 4B). In Figure 4B, we have compared the accuracy of the CRBM predictions for the change in ADAS-Cog score from baseline to each possible endpoint in 3-month steps through 18 months to a variety of supervised models (a linear regression, a random forest, and a deep neural network). Each of the supervised models was trained to predict a specific endpoint (e.g., the change in ADAS-Cog score after 6 months). The CRBM is the best performing model, though the accuracies of the four types of predictors converge for time periods of 15 months or longer. The strong relative performance of the CRBM on this task is remarkable given that (i) it was only trained to perform 3-month ahead simulations and (ii) it was not directly trained to predict the aggregate ADAS-Cog score. More details on the comparison are provided in the Supporting Information.

To gain more insight into the origin of fast and slow progressing patients, we simulated 18-month patient trajectories conditioned on a baseline ADAS-Cog score of 10 and an initial diagnosis of MCI. This initial ADAS-Cog score was chosen because it is representative of a typical patient with MCI. The 5% of synthetic patients with the largest ADAS-Cog score increase were designated “fast progressors” and the bottom 5% of synthetic patients with the smallest ADAS-Cog score increase were designated “slow progressors”. Differences between the fast and slow progressors (the “absolute effect size”) were quantified using the absolute value of Cohen’s -statistic Cohen (1988), as shown in Figure 4C. The majority of baseline variables are not associated with disease progression; however, there are strong associations with cognitive tests based on recall (i.e., MMSE recall, ADAS word recall, and ADAS delayed word recall) and word recognition. That is, patients with poor performance on the ADAS delayed word recall test tend to progress more rapidly – even after controlling for the total ADAS-Cog score. Variables associated with progression in patients who already have AD are described in the Supporting Information.

Iv Discussion

The ability to simulate the stochastic disease progression of individual patients in high resolution could have a transformative impact on patient care by enabling personalized data-driven medicine. Each patient with a given diagnosis has unique risks and a unique response to therapy. Due to this heterogeneity, predictive models cannot currently make individual-level forecasts with a high degree of confidence. Therefore, it is critical that data-driven approaches to personalized medicine and clinical decision support provide estimates of their uncertainty in addition to expected outcomes.

Previous efforts for modeling disease progression in AD have focused on predicting changes in predefined outcomes such as the ADAS-Cog score or the probability of conversion from MCI to AD  Rogers et al. (2012); Ito et al. (2013); Kennedy et al. (2016); Tishchenko et al. (2016); Szalkai et al. (2017); Mueller et al. (2005); Risacher et al. (2009); Hinrichs et al. (2011); Ito et al. (2011); Suk and Shen (2013); Suk et al. (2014); Liu et al. (2014); Ortiz et al. (2016); Samper-Gonzalez et al. (2017). Here, we have demonstrated that an approach based on unsupervised deep learning can create stochastic simulations of entire patient trajectories that achieve the same level of performance on individual prediction tasks as specific models while also accurately capturing correlations between variables. Deep learning-based generative models provide much more information than specific models, thereby enabling a simultaneous and detailed assessment of different risks.

Our approach to modeling patient trajectories in AD overcomes many of the limitations of previous applications of deep learning to clinical data Goldstein et al. (2017); Miotto et al. (2016); Esteban et al. (2017); Choi et al. (2017). CRBMs can directly integrate multimodal data with both continuous and discrete variables, and time-dependent and static variables, within a single model. In addition, bidirectional models like CRBMs can easily handle missing observations in the training set by performing automated imputation during training. Combined, these factors dramatically reduce the amount of data preprocessing steps needed to train a generative model to produce synthetic clinical data. We found that a single time-lagged connection was sufficient for explaining temporal correlations in AD; additional connections may be required for diseases with more complex temporal evolution.

The utility of cognitive scores as a measure of disease activity for patients with AD has been called into question numerous times Benge et al. (2009). Here, we found that the components of the ADAS-Cog and MMSE scores were only weakly correlated with other clinical variables. One possible explanation is that the observed stochasticity may simply reflect heterogeneity in performance on the cognitive exam that cannot be predicted from any baseline measurements. However, we did find that some of the individual components of the baseline cognitive scores are predictive of progression. Specifically, patients with poor performance on word recall tests tend to progress more rapidly than other patients, even after controlling for the ADAS-Cog score.

V Conclusions

This work provides a proof-of-concept that patient-level simulations are technologically feasible with the right tools and data. Nevertheless, there are a number of improvements to our dataset and methodology that are important steps for future research. Here, we limited ourselves to modeling 44 variables that are commonly measured in AD clinical trials. We excluded some interesting covariates such as Leukocyte populations because they were not measured in the majority of patients in our dataset constructed from the CAMD database. We also lack data from neuroimaging studies and tests for levels of amyloid-. Incorporating additional data into our model development will be a crucial next step, especially as surrogate biomarkers become a standard part of clinical trials.

The approach to simulating disease progression that we describe here can be easily extended to other diseases. Widespread application of deep generative models to clinical data could produce synthetic datasets with lower privacy concerns than real medical data Beaulieu-Jones et al. (2017), or could be used to run simulated clinical trials to optimize study design or as synthetic control arms. In certain disease areas, tools that use simulations to forecast risks for specific individuals could help doctors choose the right treatments for their patients. Currently, progress towards these goals is slowed by the limited availability of high quality longitudinal health datasets and the limited ability of current machine learning methods to produce insights from these datasets.

Vi Acknowledgements

We would like to thank Yannick Pouliot, Pankaj Mehta, and Diane Dickel for helpful comments while preparing the manuscript. Data used in the preparation of this article were obtained from the Coalition Against Major Diseases (CAMD) database. In 2008, Critical Path Institute, in collaboration with the Engelberg Center for Health Care Reform at the Brookings Institution, formed the Coalition Against Major Diseases (CAMD). The Coalition brings together patient groups, biopharmaceutical companies, and scientists from academia, the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), the National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute on Aging (NIA). The Coalition Against Major Diseases (CAMD) includes over 200 scientists from member and non-member organizations. The data available in the CAMD database has been volunteered by CAMD member companies and non-member organizations.

References

  • Collins and Varmus (2015) F. S. Collins and H. Varmus, New England Journal of Medicine 372, 793 (2015).
  • Rajkomar et al. (2018) A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, et al., npj Digital Medicine 1, 18 (2018).
  • Miotto et al. (2016) R. Miotto, L. Li, B. A. Kidd,  and J. T. Dudley, Scientific reports 6, 26094 (2016).
  • Choi et al. (2016) E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart,  and J. Sun, in Machine Learning for Healthcare Conference (2016) pp. 301–318.
  • Lasko et al. (2013) T. A. Lasko, J. C. Denny,  and M. A. Levy, PloS one 8, e66341 (2013).
  • Lipton et al. (2015) Z. C. Lipton, D. C. Kale, C. Elkan,  and R. Wetzel, arXiv preprint arXiv:1511.03677  (2015).
  • Myers et al. (2017) P. D. Myers, B. M. Scirica,  and C. M. Stultz, Scientific reports 7, 12692 (2017).
  • Choi et al. (2017) E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart,  and J. Sun, arXiv preprint arXiv:1703.06490  (2017).
  • Esteban et al. (2017) C. Esteban, S. L. Hyland,  and G. Rätsch, arXiv preprint arXiv:1706.02633  (2017).
  • Beaulieu-Jones et al. (2017) B. K. Beaulieu-Jones, Z. S. Wu, C. Williams,  and C. S. Greene, bioRxiv , 159756 (2017).
  • Goldstein et al. (2017) B. A. Goldstein, A. M. Navar, M. J. Pencina,  and J. Ioannidis, Journal of the American Medical Informatics Association 24, 198 (2017).
  • Kumar et al. (2015) A. Kumar, A. Singh, et al., Pharmacological Reports 67, 195 (2015).
  • Rosen et al. (1984) W. G. Rosen, R. C. Mohs,  and K. L. Davis, The American journal of psychiatry  (1984).
  • Folstein et al. (1975) M. F. Folstein, S. E. Folstein,  and P. R. McHugh, Journal of psychiatric research 12, 189 (1975).
  • Cummings et al. (2016) J. Cummings, P. S. Aisen, B. DuBois, L. Frölich, C. R. Jack, R. W. Jones, J. C. Morris, J. Raskin, S. A. Dowsett,  and P. Scheltens, Alzheimer’s research & therapy 8, 39 (2016).
  • Rogers et al. (2012) J. A. Rogers, D. Polhamus, W. R. Gillespie, K. Ito, K. Romero, R. Qiu, D. Stephenson, M. R. Gastonguay,  and B. Corrigan, Journal of pharmacokinetics and pharmacodynamics 39, 479 (2012).
  • Ito et al. (2013) K. Ito, B. Corrigan, K. Romero, R. Anziano, J. Neville, D. Stephenson,  and R. Lalonde, Journal of Alzheimer’s Disease 37, 173 (2013).
  • Kennedy et al. (2016) R. E. Kennedy, G. R. Cutter, G. Wang,  and L. S. Schneider, Journal of Alzheimer’s Disease 50, 1205 (2016).
  • Tishchenko et al. (2016) I. Tishchenko, C. Riveros, P. Moscato,  and C. A. M. Diseases, Future science OA 2, FSO140 (2016).
  • Szalkai et al. (2017) B. Szalkai, V. K. Grolmusz, V. I. Grolmusz, et al., Archives of gerontology and geriatrics 73, 300 (2017).
  • Mueller et al. (2005) S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga,  and L. Beckett, Alzheimer’s & dementia: the journal of the Alzheimer’s Association 1, 55 (2005).
  • Risacher et al. (2009) S. L. Risacher, A. J. Saykin, J. D. Wes, L. Shen, H. A. Firpi,  and B. C. McDonald, Current Alzheimer Research 6, 347 (2009).
  • Hinrichs et al. (2011) C. Hinrichs, V. Singh, G. Xu, S. C. Johnson, A. D. N. Initiative, et al., Neuroimage 55, 574 (2011).
  • Ito et al. (2011) K. Ito, B. Corrigan, Q. Zhao, J. French, R. Miller, H. Soares, E. Katz, T. Nicholas, B. Billing, R. Anziano, et al., Alzheimer’s & dementia: the journal of the Alzheimer’s Association 7, 151 (2011).
  • Suk and Shen (2013) H.-I. Suk and D. Shen, in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2013) pp. 583–590.
  • Suk et al. (2014) H.-I. Suk, S.-W. Lee, D. Shen, A. D. N. Initiative, et al., NeuroImage 101, 569 (2014).
  • Liu et al. (2014) S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis,  and D. Feng, in Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on (IEEE, 2014) pp. 1015–1018.
  • Ortiz et al. (2016) A. Ortiz, J. Munilla, J. M. Gorriz,  and J. Ramirez, International journal of neural systems 26, 1650025 (2016).
  • Samper-Gonzalez et al. (2017) J. Samper-Gonzalez, N. Burgos, S. Fontanella, H. Bertin, M.-O. Habert, S. Durrleman, T. Evgeniou, O. Colliot, A. D. N. Initiative, et al., in International Workshop on Machine Learning in Medical Imaging (Springer, 2017) pp. 53–60.
  • Corrigan et al. (2014) B. Corrigan, K. Ito, J. Rogers, D. Polhamus, D. Stephenson,  and K. Romero, in Applied Pharmacometrics (Springer, 2014) pp. 451–476.
  • Romero et al. (2015) K. Romero, K. Ito, J. Rogers, D. Polhamus, R. Qiu, D. Stephenson, R. Mohs, R. Lalonde, V. Sinha, Y. Wang, et al., Clinical Pharmacology & Therapeutics 97, 210 (2015).
  • Romero et al. (2009) K. Romero, M. De Mars, D. Frank, M. Anthony, J. Neville, L. Kirby, K. Smith,  and R. Woosley, CliniCAl pHArmACology & THerApeuTiCs 86, 365 (2009).
  • Neville et al. (2015) J. Neville, S. Kopko, S. Broadbent, E. Avilés, R. Stafford, C. M. Solinsky, L. J. Bain, M. Cisneroz, K. Romero,  and D. Stephenson, Alzheimer’s & dementia: the journal of the Alzheimer’s Association 11, 1212 (2015).
  • Kubick et al. (2007) W. R. Kubick, S. Ruberg,  and E. Helton, Drug information journal 41, 373 (2007).
  • Hume et al. (2016) S. Hume, J. Aerts, S. Sarnikar,  and V. Huser, Journal of biomedical informatics 60, 352 (2016).
  • Ackley et al. (1985) D. H. Ackley, G. E. Hinton,  and T. J. Sejnowski, Cognitive science 9, 147 (1985).
  • Hinton (2010) G. Hinton, Momentum 9, 926 (2010).
  • Taylor et al. (2007) G. W. Taylor, G. E. Hinton,  and S. T. Roweis, in Advances in neural information processing systems (2007) pp. 1345–1352.
  • Mnih et al. (2011) V. Mnih, H. Larochelle,  and G. E. Hinton, in 

    Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence

     (AUAI Press, 2011) pp. 514–522.
  • Tubiana and Monasson (2017) J. Tubiana and R. Monasson, Physical review letters 118, 138301 (2017).
  • Tieleman (2008) T. Tieleman, in Proceedings of the 25th international conference on Machine learning (ACM, 2008) pp. 1064–1071.
  • Fisher et al. (2018) C. K. Fisher, A. M. Smith,  and J. R. Walsh, arXiv preprint arXiv:1804.08682  (2018).
  • Dankar and El Emam (2013) F. K. Dankar and K. El Emam, Transactions on Data Privacy 6, 35 (2013).
  • Sano et al. (2011) M. Sano, R. Raman, J. Emond, R. G. Thomas, R. Petersen, L. S. Schneider,  and P. S. Aisen, Alzheimer Dis Assoc Disord 25, 122 (2011).
  • Cohen (1988) J. Cohen, Statistical power analysis for the behavioral sciences (Lawrence Erlbaum Associates, 1988).
  • Benge et al. (2009) J. F. Benge, S. Balsis, L. Geraci, P. J. Massman,  and R. S. Doody, Dementia and geriatric cognitive disorders 28, 63 (2009).

Vii Supporting Information

vii.1 Data Processing

The CAMD database stores data using CDISC standards, specifically the Study Data Tabulation Model (SDTM), which defines a common schema for clinical trial data and is the required standard for clinical data submissions to the United States Food and Drug Administration (FDA). In this format the data is already highly structured; therefore it is possible to develop data processing pipelines that can apply to SDTM data in general and not simply the particular database used here. We describe the general architecture of our data processing pipeline and the CAMD-specific processing used.

The goal of our processing pipeline is to arrive at data that may be directly used by machine learning algorithms to build patient-level models. This means:

  • Data must be numerically formatted, such as numeric values, ordinal values for scores, and one-hot encoding for categorical variables. For text or image data, this may involve feature extraction, e.g. through a word2vec model or an autoencoder.

  • Data must be patient-specific, and can extend over time in regular intervals. For example, if we have cholesterol measurements for a given patient at 1, 2, 5, and 12 months, but are modeling the population at 3-month intervals, then we may average the 1- and 2-month time point values and will have a missing entry between the 5- and 12-month values.

Data arrives in Comma Separated Value (CSV) formatted text files, with abbreviation encodings for file and variable names. Many of these abbreviation are generic to SDTM and some apply specifically to disease areas. A translation table, such as one provided by CAMD, may be used to automatically convert abbreviations to human-readable names. Using this translation and simple type inference on variables, the data is ingested into a SQL database via a simple script. We label this data in this form as the raw database.

The main component of the processing pipeline extracts data appropriate for training and evaluating machine learning algorithms. This is done on a per-variable basis, meaning the primary functions in the pipeline produce data for only a single variable; this processing is then repeated over all variables of interest. Processed data is stored in the processed database and may directly be used to construct datasets for machine learning. The steps in the processing are:

  • Declare which columns and tables from the raw database will be used to produce the data for a given variable.

  • Declare a processing function to convert this data into the appropriate form.

  • Declare a location in the processed database where the data will be stored.

  • Query the raw database for the data, apply the processing function, and store the result in the processed database.

The processing functions may be common, such as one-hot encoding categorical labels, or they may be custom, such as standardizing units for a particular laboratory measurement. Such custom functions form the bulk of database-specific code that must be written. All of the above processing steps can easily be encoded in configuration files, meaning the process of preparing data for machine learning is simple and repeatable.

Finally, datasets may be constructed from the processed database by merely specifying which variables are to be used. This step is also performed via a configuration file. Additional filtering of patients, e.g. by requiring they have data present for a certain number of time points, is straightforward to apply.

We have developed a python library to process data as described above. This library fully handles the interface with the SQL database, has common data conversion functions, and provides utilities to provide summary statistics and type inference for variables in a dataset. For any specific project, such as the CAMD database, most of the processing is set up by writing YAML configuration files that are simple, human-readable, and easily verified. The remainder involves writing custom processing functions in python for specific variables. This setup makes it straightforward to apply our machine learning models to other clinical data modeling problems.

vii.1.1 Variables Used in Training

Variables relevant to modeling AD progression were extracted from the CAMD database using the method described above, and 44 variables without substantial missing data were identified and used for the model. Tables 3 and 4 lists all variables, their units, and specific processing considerations for each. Each laboratory test variable is converted to the units given, and all transformations applied to train the models are inverted for analysis.


Category
Name Units Notes
ADAS Commands counts ordinal range
ADAS Comprehension counts ordinal range
ADAS Construction counts ordinal range
ADAS Delayed Word Recall counts ordinal range
ADAS Ideational counts ordinal range
ADAS Instructions counts ordinal range
ADAS Naming counts ordinal range
ADAS Orientation counts ordinal range
ADAS Spoken Language counts ordinal range
ADAS Word Finding counts ordinal range
ADAS Word Recall counts ordinal range
ADAS Word Recognition counts ordinal range
MMSE Attention and Calculation counts ordinal range
MMSE Language counts ordinal range
MMSE Orientation counts ordinal range
MMSE Recall counts ordinal range
MMSE Registration counts ordinal range
Table 3: Cognitive variables included in the model.
Category Name Units Notes
Laboratory Alanine aminotransferase kat/l log-standarized for training
Laboratory Alkaline phosphatase kat/l log-standarized for training
Laboratory Aspartate aminotransferase kat/l log-standarized for training
Laboratory Cholesterol mmol/l log-standarized for training
Laboratory Creatine kinase iu/cl log-standarized for training
Laboratory Creatinine mg/dl log-standarized for training
Laboratory Gamma glutamyl transferase iu/dl log-standarized for training
Laboratory Hematocrit counts log-standarized for training
Laboratory Hemoglobin g/dl log-standarized for training
Laboratory Hemoglobin a1c % log-standarized for training
Laboratory Indirect bilirubin mg/dl log-standarized for training
Laboratory Potassium mmol/l log-standarized for training
Laboratory Sodium mmol/cl log-standarized for training
Laboratory Triglycerides g/l log-standarized for training
Clinical Blood pressure (diastolic) mmHg log-standarized for training
Clinical Blood pressure (systolic) mmHg log-standarized for training
Clinical Heart rate bpm log-standarized for training
Clinical Weight kg log-standarized for training
Clinical Dropout -   1 for dropout before the next time
Background Age at baseline Years ‘>89’ , log-std for training
Background Geographic region - 1-hot, 7 labels built from country
Background Initial diagnosis (AD or MCI) - Bernoulli
Background Past cardiovascular event - Bernoulli
Background ApoE 4 allele count counts 0, 1, or 2
Background Race - 1-hot, 6 labels
Background Sex - Bernoulli, 1 if female
Background Height cm log-standarized for training
Table 4: Laboratory, clinical, and background variables included in the model.

Of the 6500 patients in the CAMD database, very few have data after approximately 18 months from baseline. A 3-month (90-day) interval was a suitable interval such that most patients have data at every time point; shorter intervals yielded groups of patients without data at some time points. Therefore, we chose to represent all temporal variables in 3-month intervals from 0 (baseline) to 18 months, giving 7 available time points for each patient.

vii.1.2 Patients Used in Training

To model progression, we are most interested in patients with longer trajectories. Therefore, we selected the 1908 patients that have data at the 15- or 18-month time points. These patients were randomly divided into three groups: training (1335 patients, or 70%), validation (95 patients, or 5%), and testing (478 patients, or 25%). The CRBM is trained only on the training group, with the validation group used to evaluate the model’s performance during training. The supervised models predicting progression are trained on the combination of the training and validation groups using 5-fold nested cross validation. Analysis is performed only on the testing group.

vii.2 Motivation for CRBMs

Boltzmann machines are a well-known, standard machine learning algorithm for modeling relationships between data. They provide several features critical to modeling clinical data not found in most machine learning models:

  • They can easily model multimodal data. Different neuron types may be used to model continuous numeric, ordinal, Bernoulli, or categorical data.

  • They allow for conditional and generative sampling. If some clinical data is known for a patient, it can be used to predict unknown data for that patient. For example, an initial population of an AD clinical trial may be defined in terms of standard inclusion criteria, such as age, sex ratio, and ADAS-Cog scores, and the remaining baseline data and any future data may be predicted.

  • As a consequence, they naturally handle missing data. The model itself may be used to impute missing values from the learned joint probability distribution of the data. This may be done during training, meaning missing data can be directly fed into the model.

  • They are stochastic, meaning data may be sampled. Stochastic models naturally provide an estimate of their uncertainty through this sampling. For clinical data, the consequence is that the model returns both a prediction and an uncertainty for any clinical variable being predicted.

Conditional Restricted Boltzmann Machines Ackley et al. (1985); Hinton (2010); Taylor et al. (2007); Mnih et al. (2011) provide a way to model time series data using the natural capabilities of Boltzmann machines. Our CRBM contains the visible units for multiple time points, with a standard hidden layer. The visible units are organized as:

(2)

where is the time lag of the model. In our model, we use , so that two time points are learned simultaneously. The static units are only used once, as they are constant over all times. The model learns the complete joint probability distribution between all adjacent time points simultaneously. That means that any conditional sampling of the data may be performed, such as predicting the data for a time point given the previous time points. A baseline cohort may be simulated by sampling from the model and using the first time point. This treatment of the data to allow for learning inter-dependence between time points is the only distinction of a CRBM over a standard RBM.

vii.3 Details of Training

The CRBM is trained on the data from adjacent pairs of time points. If is the vector of time-dependent variables for a patient at time and is the vector of static variables for the same patient, then the visible units used to train the CRBM are , a concatenation of the data from the adjacent time points and with the static variables represented only once.

When training the CRBM, the data for each patient in the training and validation groups are reorganized into all adjacent pairs of frames. Since each patient has data for 7 time points, they contribute 6 pairs of time points (which we will call samples). Inside of each group samples are all shuffled so that minibatches contain a mixture of patients and times.

The CRBM has a single hidden layer of 50 ReLU units, and is trained using the methods described in Fisher et al. (2018). The objective function is a linear combination of log-likelihood and adversarial objectives,

(3)

where is a parameter weighing the relative size of the two objectives. Parameters used to train the CRBM are listed in Table 5. The training setup is similar to those used in Fisher et al. (2018)

; here we use a random forest classifier for the adversary. Temperature-driven sampling, where the temperature is sampled from an autocorrelated Gamma distribution, is not used, though the model performance is not especially sensitive to this choice.

Notable dynamics were observed during the training process. Within 100 epochs, metrics monitored during training such as KL divergence and reverse KL divergence achieve values close to their final values. Sampling from the model at this stage, the model has relatively poorer performance for patients with extremal ADAS-Cog scores than those near the mode. Most importantly, during the early stages of training the model has a strong regression to the mean effect for ADAS-Cog score outliers, where the model predicts patients with a low score progress rapidly and those with a high score

improve – the opposite of what is observed in the data. Continuing training allows the model to unlearn this behavior and correctly learn a progression from the mean, where higher scoring individuals progress more rapidly. We expect that this feature of training, where it takes a longer time to effectively learn the behavior of outliers, is common.

Hyperparameter Value / Notes
number of epochs 2000
batch size 100
training/validation fractions
learning rate   0.005 initial; 0 final; linear decay
optimizer ADAM, beta
Monte Carlo steps (sampling) 50
Monte Carlo steps (imputation) 2
driven sampling 0
  likelihood weight (Equation 3) 0.3
adversary random forest, 5 trees with max depth 5
Table 5: Hyperparameters used to train the CRBM.

vii.3.1 Details of Progression Predictions

This subsection gives details on the modeling for Figure 4B. A linear regression, random forest regression, and neural network were trained to predict the ADAS-Cog score change from the baseline patient data at a single given readout time. The CRBM is also used to predict the same score change. The performance of all the algorithms are very similar; it is likely additional patients or additional data predictive of patient progression would be needed to substantially improve the performance of the models. For example, we found that the addition of the ApoE allele count to the baseline variables decreases the RMS error by 5–10%. This section gives details on the training of the supervised models and the evaluation of all models.

The supervised models are trained from the baseline time point data to predict the ADAS-Cog score change from baseline to readout. All possible time points (3, 6, 9, 12, 15, and 18 months) are used as readout times, and separate supervised models must be trained for each readout time. The same CRBM model may be used for all readout times.

However, there is missing data for many patients. We exclude any patients that having any missing ADAS-Cog components at baseline or readout, ensuring that valid labels can be defined. We mean impute other missing baseline variables from the training data. After this screening, the number of training patients for the supervised algorithms and testing patients for all algorithms as a function of readout time is given in Table 6.

Readout Time [months] Training Patients Testing Patients
3 1404 468
6 1401 467
9 1392 461
12 1392 461
15 1402 468
18 1285 439
Table 6: Number of training and testing patients used to predict ADAS-Cog score progression as a function of readout time.

The supervised algorithms are trained using 5-fold nested cross validation, drawing samples from the same dataset as the training and validation samples for the CRBM. For the neural network, only one set of hyperparameters was chosen and so there was only a single cross validation loop. All algorithms are evaluated on the same test set. Table 7 gives the architectures and hyperparameters for each of the supervised algorithms. Note that for the supervised algorithms, a new model was trained for every timepoint, while the CRBM is the same model over all timepoints.

Model Architecture Hyperparameters
Linear Regression ridge ( regularization)
Random Forest 100 trees max depth
Neural Network
  2 hidden layers (30, 10) units
ReLU activations
  ADAM learning rate = 0.02
batch size = 25
20 epochs
Table 7: Supervised models used to predict ADAS-Cog score progression.

Once trained, the supervised algorithms are used to predict the ADAS-Cog score change for the test data, and a root mean square (RMS) error over the test set is computed. This is done for each readout time. The predictions for the CRBM (over all readout times) are obtained by repeatedly simulating patient trajectories from the baseline time point. For each simulation, the ADAS-Cog score change is recorded, yielding a distribution of score changes for each patient

that represents the probabilistic distribution of predictions made by the CRBM. The mean of this distribution is the prediction of the CRBM for the given patient. The RMS error of these predictions is computed, as well as its standard error. These results make up the data shown in Figure 4B.

vii.3.2 Details of Trajectory Progression Predictions

This subsection gives details on the modeling for Figure 3. The approach is the natural extension of the methodology described in the previous subsection.

A random forest is trained to predict a single time-dependent variable at a single readout time, meaning over all 35 time-dependent variables and all 6 possible readout times, 210 different random forest models are trained. The input data are the baseline variables, with mean imputation used in the case of missing data. Samples where either the baseline value or the readout value (the label) are missing are excluded. The root mean square (RMS) error is computed over the test data, again only using samples where the variable being modeled is present at both baseline and the readout time. For the CRBM, predictions are made by repeatedly simulating patients conditioned on their baseline data, and taking the mean for each patient as the CRBM prediction. The RMS error can then be computed, using the same test samples on which the random forest models were evaluated.

It is helpful to normalize these errors by the standard deviation of the value to be predicted. An error ratio of 1 implies that the prediction is no better than predicting the mean of the test data, and an error ratio well below 1 implies that the prediction is highly precise at a per-patient level.

vii.4 Additional Results

There are many ways to study the performance of an unsupervised model, so we take the opportunity to present additional results that provide more insight into the CRBM.

Figure 5: Stochastic simulations enable individual assessments of risk. Violin plots display the stochastic evolution of an MCI patient whose ADAS score change over 18 months suggests a conversion to AD. The width of the blue bars represents the probability computed using simulations from the CRBM, and the mean CRBM prediction is shown as the black line. The red dots show the actual observed values from the chosen patient. The CRBM was initialized with the observed values at baseline (). Then, we repeatedly simulated 18-month trajectories and created histograms of each variable at every time point. The model predicts trends and imputes values when observations are missing for the patient (e.g., lab scores such as cholesterol and hematocrit). Units for these data are given in Table 4.
Figure 6: Marginal distributions in the generative mode for all variables. The CRBM is used to generate 18-month patient trajectories, with the same number of virtual patients as the number of patients in the test group. The marginal distributions for the patients and the CRBM are shown for all variables. Note that some variables are constant over the virtual and real patients given the relatively small sample size (478 patients).
Figure 7: Marginal distributions conditioned on baseline for cognitive variables. The CRBM is conditioned on the patient data at baseline, and one trajectory is simulated for each patient. The marginal distributions for the patients and the CRBM are shown for temporal variables. Because the CRBM is conditioned on the data at baseline, the 0-month distributions always match except when the data is missing values and the model performs imputation.
Figure 8: Marginal distributions conditioned on baseline for clinical variables. The CRBM is conditioned on the patient data at baseline, and one trajectory is simulated for each patient. The marginal distributions for the patients and the CRBM are shown for temporal variables. Because the CRBM is conditioned on the data at baseline, the 0-month distributions always match except when the data is missing values and the model performs imputation.
Figure 9: Goodness-of-fit. A) Correlations between variables as predicted by the model (below the diagonal) conditioned on the data at baseline () and calculated from the data (above the diagonal). Components of the cognitive scores are strongly correlated with each other, but not with other clinical data. B) Scatterplot of observed vs predicted correlations for each time point, over all times. C) Scatterplot of observed vs predicted autocorrelations with time lag of 3 months. D) Scatterplot of observed vs predicted autocorrelations with time lag of 6 months. Color gradient in B-D represents the fraction of observations where the variables used to compute the correlation were present; lighter colors mean more of the data was missing. This figure is a complement to Figure 2.
Figure 10: Using simulations to interpret prognostic signals for AD progression. We created a simulated patient population with AD and an initial ADAS score of 20 (typical for AD), and simulated the evolution of each virtual patient for 18 months. The 5% of virtual patients with the largest ADAS score increase were designated “fast progressors” and the bottom 5% of patients with the smallest ADAS score increase were designated “slow progressors”. Differences between the fast and slow progressors (the “absolute effect size”) were quantified using the absolute value of Cohen’s -statistic. This figure is a complement to Figure 4C.
Figure 11: Confidence in ADAS score progression by the CRBM. For each patient with a valid ADAS score at baseline and 18 months, the CRBM is used to repeatedly simulate 18-month trajectories. The mean of the ADAS score changes from these trajectories is the CRBM prediction, and the standard deviation is a measure of the CRBM confidence. These values are used to compute a standard score for each patient by taking the difference between the CRBM prediction and the true ADAS score change and dividing by the standard deviation of predictions. These scores are 0-centered, tend to be fairly normally distributed (A), and do not correlate with the CRBM confidence (B).
Figure 12: Predictions of ADAS score progression for example patients. Starting with baseline data for 3 example patients, the CRBM was used to predict the change in ADAS score over 18 months. By repeatedly simulating trajectories for each patient, the CRBM provides a set of predictions per patient that forms a probability distribution. The mean of this distribution is the CRBM prediction, which is compared with the true value of the ADAS score change for the patient. The width of the distribution is a measure of the confidence of the CRBM prediction.
Figure 13: Expected evolution of ADAS score components for example patients. Starting with baseline data for 3 example patients, the CRBM was used to repeatedly simulate 18-month trajectories for each patient. The mean value of each of the ADAS-Cog score components for each time point is shown as a blue bar, demonstrating the ability of the CRBM to simulate the granular ADAS score components. The total mean ADAS score for each time point is shown at the top.
Figure 14: The model accurately forecasts across variables. Relative errors of the model (CRBM) and a “global” random forest (RF(g)) trained to predict the value of all variables at a single time point. The root mean square (RMS) errors are scaled by the standard deviation of the data to be predicted. Predictions are shown for every time-dependent variable except dropout. At each time point and for each variable, the better of the random forest and CRBM predictions is shown in bold. The CRBM strongly outperforms the global random forest. This figure is a complement to Figure 3, where here the random forest is trained to predict all variables at a time point instead of having separate random forests for each variable.

References

  • Collins and Varmus (2015) F. S. Collins and H. Varmus, New England Journal of Medicine 372, 793 (2015).
  • Rajkomar et al. (2018) A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, et al., npj Digital Medicine 1, 18 (2018).
  • Miotto et al. (2016) R. Miotto, L. Li, B. A. Kidd,  and J. T. Dudley, Scientific reports 6, 26094 (2016).
  • Choi et al. (2016) E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart,  and J. Sun, in Machine Learning for Healthcare Conference (2016) pp. 301–318.
  • Lasko et al. (2013) T. A. Lasko, J. C. Denny,  and M. A. Levy, PloS one 8, e66341 (2013).
  • Lipton et al. (2015) Z. C. Lipton, D. C. Kale, C. Elkan,  and R. Wetzel, arXiv preprint arXiv:1511.03677  (2015).
  • Myers et al. (2017) P. D. Myers, B. M. Scirica,  and C. M. Stultz, Scientific reports 7, 12692 (2017).
  • Choi et al. (2017) E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart,  and J. Sun, arXiv preprint arXiv:1703.06490  (2017).
  • Esteban et al. (2017) C. Esteban, S. L. Hyland,  and G. Rätsch, arXiv preprint arXiv:1706.02633  (2017).
  • Beaulieu-Jones et al. (2017) B. K. Beaulieu-Jones, Z. S. Wu, C. Williams,  and C. S. Greene, bioRxiv , 159756 (2017).
  • Goldstein et al. (2017) B. A. Goldstein, A. M. Navar, M. J. Pencina,  and J. Ioannidis, Journal of the American Medical Informatics Association 24, 198 (2017).
  • Kumar et al. (2015) A. Kumar, A. Singh, et al., Pharmacological Reports 67, 195 (2015).
  • Rosen et al. (1984) W. G. Rosen, R. C. Mohs,  and K. L. Davis, The American journal of psychiatry  (1984).
  • Folstein et al. (1975) M. F. Folstein, S. E. Folstein,  and P. R. McHugh, Journal of psychiatric research 12, 189 (1975).
  • Cummings et al. (2016) J. Cummings, P. S. Aisen, B. DuBois, L. Frölich, C. R. Jack, R. W. Jones, J. C. Morris, J. Raskin, S. A. Dowsett,  and P. Scheltens, Alzheimer’s research & therapy 8, 39 (2016).
  • Rogers et al. (2012) J. A. Rogers, D. Polhamus, W. R. Gillespie, K. Ito, K. Romero, R. Qiu, D. Stephenson, M. R. Gastonguay,  and B. Corrigan, Journal of pharmacokinetics and pharmacodynamics 39, 479 (2012).
  • Ito et al. (2013) K. Ito, B. Corrigan, K. Romero, R. Anziano, J. Neville, D. Stephenson,  and R. Lalonde, Journal of Alzheimer’s Disease 37, 173 (2013).
  • Kennedy et al. (2016) R. E. Kennedy, G. R. Cutter, G. Wang,  and L. S. Schneider, Journal of Alzheimer’s Disease 50, 1205 (2016).
  • Tishchenko et al. (2016) I. Tishchenko, C. Riveros, P. Moscato,  and C. A. M. Diseases, Future science OA 2, FSO140 (2016).
  • Szalkai et al. (2017) B. Szalkai, V. K. Grolmusz, V. I. Grolmusz, et al., Archives of gerontology and geriatrics 73, 300 (2017).
  • Mueller et al. (2005) S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga,  and L. Beckett, Alzheimer’s & dementia: the journal of the Alzheimer’s Association 1, 55 (2005).
  • Risacher et al. (2009) S. L. Risacher, A. J. Saykin, J. D. Wes, L. Shen, H. A. Firpi,  and B. C. McDonald, Current Alzheimer Research 6, 347 (2009).
  • Hinrichs et al. (2011) C. Hinrichs, V. Singh, G. Xu, S. C. Johnson, A. D. N. Initiative, et al., Neuroimage 55, 574 (2011).
  • Ito et al. (2011) K. Ito, B. Corrigan, Q. Zhao, J. French, R. Miller, H. Soares, E. Katz, T. Nicholas, B. Billing, R. Anziano, et al., Alzheimer’s & dementia: the journal of the Alzheimer’s Association 7, 151 (2011).
  • Suk and Shen (2013) H.-I. Suk and D. Shen, in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2013) pp. 583–590.
  • Suk et al. (2014) H.-I. Suk, S.-W. Lee, D. Shen, A. D. N. Initiative, et al., NeuroImage 101, 569 (2014).
  • Liu et al. (2014) S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis,  and D. Feng, in Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on (IEEE, 2014) pp. 1015–1018.
  • Ortiz et al. (2016) A. Ortiz, J. Munilla, J. M. Gorriz,  and J. Ramirez, International journal of neural systems 26, 1650025 (2016).
  • Samper-Gonzalez et al. (2017) J. Samper-Gonzalez, N. Burgos, S. Fontanella, H. Bertin, M.-O. Habert, S. Durrleman, T. Evgeniou, O. Colliot, A. D. N. Initiative, et al., in International Workshop on Machine Learning in Medical Imaging (Springer, 2017) pp. 53–60.
  • Corrigan et al. (2014) B. Corrigan, K. Ito, J. Rogers, D. Polhamus, D. Stephenson,  and K. Romero, in Applied Pharmacometrics (Springer, 2014) pp. 451–476.
  • Romero et al. (2015) K. Romero, K. Ito, J. Rogers, D. Polhamus, R. Qiu, D. Stephenson, R. Mohs, R. Lalonde, V. Sinha, Y. Wang, et al., Clinical Pharmacology & Therapeutics 97, 210 (2015).
  • Romero et al. (2009) K. Romero, M. De Mars, D. Frank, M. Anthony, J. Neville, L. Kirby, K. Smith,  and R. Woosley, CliniCAl pHArmACology & THerApeuTiCs 86, 365 (2009).
  • Neville et al. (2015) J. Neville, S. Kopko, S. Broadbent, E. Avilés, R. Stafford, C. M. Solinsky, L. J. Bain, M. Cisneroz, K. Romero,  and D. Stephenson, Alzheimer’s & dementia: the journal of the Alzheimer’s Association 11, 1212 (2015).
  • Kubick et al. (2007) W. R. Kubick, S. Ruberg,  and E. Helton, Drug information journal 41, 373 (2007).
  • Hume et al. (2016) S. Hume, J. Aerts, S. Sarnikar,  and V. Huser, Journal of biomedical informatics 60, 352 (2016).
  • Ackley et al. (1985) D. H. Ackley, G. E. Hinton,  and T. J. Sejnowski, Cognitive science 9, 147 (1985).
  • Hinton (2010) G. Hinton, Momentum 9, 926 (2010).
  • Taylor et al. (2007) G. W. Taylor, G. E. Hinton,  and S. T. Roweis, in Advances in neural information processing systems (2007) pp. 1345–1352.
  • Mnih et al. (2011) V. Mnih, H. Larochelle,  and G. E. Hinton, in Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (AUAI Press, 2011) pp. 514–522.
  • Tubiana and Monasson (2017) J. Tubiana and R. Monasson, Physical review letters 118, 138301 (2017).
  • Tieleman (2008) T. Tieleman, in Proceedings of the 25th international conference on Machine learning (ACM, 2008) pp. 1064–1071.
  • Fisher et al. (2018) C. K. Fisher, A. M. Smith,  and J. R. Walsh, arXiv preprint arXiv:1804.08682  (2018).
  • Dankar and El Emam (2013) F. K. Dankar and K. El Emam, Transactions on Data Privacy 6, 35 (2013).
  • Sano et al. (2011) M. Sano, R. Raman, J. Emond, R. G. Thomas, R. Petersen, L. S. Schneider,  and P. S. Aisen, Alzheimer Dis Assoc Disord 25, 122 (2011).
  • Cohen (1988) J. Cohen, Statistical power analysis for the behavioral sciences (Lawrence Erlbaum Associates, 1988).
  • Benge et al. (2009) J. F. Benge, S. Balsis, L. Geraci, P. J. Massman,  and R. S. Doody, Dementia and geriatric cognitive disorders 28, 63 (2009).