Log In Sign Up

Identifying Interpretable Clinical Subtypes withinHeterogeneous Dementia Clinic Population

Dementia is a highly heterogeneous neurodegenerative disorder. Differences in brain pathologies lead to significant variations in the clinical presentation and progression course of patients, increasing the need for individual progression predictions. Unsupervised cluster analysis on a dementia clinic population using the Clinical Dementia Rating (CDR) component scores uncovered subtypes with different risk of dementia progression. The distribution of the CDR components provide validation and interpretability regarding the cognitive characteristics of the identified subtypes.


page 1

page 2

page 3


Using deep learning for comprehensive, personalized forecasting of Alzheimer's Disease progression

A patient is more than one number, yet most approaches to machine learni...

Deep learning predicts total knee replacement from magnetic resonance images

Knee Osteoarthritis (OA) is a common musculoskeletal disorder in the Uni...

A brain signature highly predictive of future progression to Alzheimer's dementia

Early prognosis of Alzheimer's dementia is hard. Mild cognitive impairme...

Predicting Osteoarthritis Progression in Radiographs via Unsupervised Representation Learning

Osteoarthritis (OA) is the most common joint disorder affecting substant...

I Introduction

Dementia is a set of progressive neurodegenerative disorders associated with memory loss, cognitive impairment, and general disability [2]. Dementia is highly heterogeneous, and the presence of different brain pathologies and variation in genetic background lead to significant variations in the clinical presentation and disease course. Hence, a heterogeneous group of cognitively impaired patients is composed of different subpopulations, each representing a specific disease course and characteristics [1]. In this research, we use an unsupervised data-driven clustering approach on the visit information of patients to identify sub-groups within the dementia cohort. Our aim is to analyze the dimensionality of the heterogenous dementia patient cohort and gain a better understanding of the relationship between the identified subtypes in terms of how individual patients progress through the different subtypes over time. Analyzing the cognitive profile of the subtypes can lead to effective clinical decision-making and precision diagnostics tailored to each subtype.

Ii Methods

Clinical data corresponding to office visits were extracted from the Electronic Health Records (EHR) of patients treated between June 2012 and May 2018 at the Memory Diagnostic Center (MDC) at the Washington University School of Medicine in St. Louis, a large, academic, tertiary-care referral center. Longitudinal data from 1,845 patients with 2,747 visits were eligible for inclusion, where each visit recorded a Global Clinical Dementia Rating (CDR) score. Global CDR is a 5-point scale (0, 0.5, 1, 2, 3) used to characterize 6 domains of cognitive and functional performance [4], with high scores indicating severe impairment. Compared to expensive and/or invasive procedures like neuroimaging biomarkers, the CDR score is a standard metric in dementia research and is recorded for all cognitively impaired patients in the MDC.

The six components of Global CDR: Memory, Orientation, Judgment and Problem Solving, Community Affairs, Home and Hobbies, and Personal Care were used as input for the unsupervised K-Means clustering algorithm to generate the subtypes (clusters)

[3]. Clustering analysis was performed on the 2,747 visits. This approach – using visits as opposed to patients - was taken to enable downstream longitudinal analysis and track the symptom progression rate among patients. For example, at any given point in time, a patient exists in a single cluster (subtype), but transitions between different clusters over time. The Gap-statistics algorithm was used to decide the optimal number of clusters [5].

To gain an insight into which patients have a higher probability of progressing to a more severe stage in the CDR spectrum, patient transitions between different subtypes were analyzed across multiple visits. Patients with only a single visit were censored from our analysis. The differences in progression rate, both within and between Global CDR score categories were measured. The distribution of the six CDR components within each subtype provide validation and interpretability regarding the cognitive characteristics of the identified subtypes.

Iii Results

Figure 1 shows the t-distributed stochastic neighbor embedding (T-SNE) representations of the CDR components across all visits distributed across the Global CDR score categories and the same representations distributed into the subtypes (clusters). Figure 2 shows the number of visits and CDR composition of each subtype ordered by increasing CDR score (more severe dementia). Subtypes can either be homogenous (having a unique Global CDR) or composite, including two Global CDR scores. We can observe that there is greater variability in early dementia (CDR of 0.5 or 1) leading to more subtypes with a lower CDR score, compared to the later stages of the disease (CDR = 2 and 3).

Fig. 1: T-SNE plot (left) showing the 2D representations of the CDR components across all visits. The different colors show the Global CDR category of each data point. The clustering results (right) show the same representations distributed into subtypes (clusters) with the cluster centroids marked in cyan. The x-axis and y-axis of both plots represent the 2 dimensions of the T-SNE visualization.
Fig. 2: Stacked bar plot showing the CDR composition of each subtype. The x-axis represents the subtypes arranged in increasing order of Global CDR. The y-axis represents the number of visits in each subtype for each CDR category present.

The association between the six CDR components and the subtypes provides an intuitive interpretation of the six cognitive characteristics of each subtype (Figure 3). The early stages of dementia (CDR = 0.5 C4, C7, C11, C14, C15 and C16) have both intra- and inter subtype variability. For example, C4 and C14 have healthy orientation but slightly impaired memory. Patients with more severe dementia (CDR of 2 or 3) have less cognitive variability.

Figure 4

shows the transitions from CDR 0.5 to CDR 1 between different subtypes. Out of the 6 CDR = 0.5 subtypes, subtypes C7, C11 and C15 are more probable to progress to subtypes with CDR=1 than subtypes C4, C14 and C16, estimated by the total number of outward transitions from the subtypes. We hypothesize that these differences are related to the various subtypes having different underlying etiologies of dementia. The distribution of the Memory and Orientation components of the two categories (subtypes 7, 11 and 15 versus subtypes 4, 14, and 16) in Figure 3 suggest that the six components of Global CDR vary in terms of how they predict the risk of dementia progression.

Fig. 3: Violin plot showing Memory and Orientation scores for each subtype. High scores refer to more impairment. There is both intra- and inter subtype variability in the early stages of dementia (CDR = 0.5 C4, C7, C11, C14, C15 and C16). For example, C4 and C14 have healthy orientation but slightly impaired memory.
Fig. 4: Transitions between subtypes from Global CDR 0.5 to 1. The edge weights represent the number of transitions from source to target. The values in parenthesis show the proportion of transition moving to the target out of all the transitions moving out from the source. The homogenous subtypes include only the respective CDR = 0.5 or 1. The composite subtypes C2, C5 and C10 includes visits only with CDR =1. subtypes C7, C11 and C15 (CDR = 0.5) are more probable to progress to subtypes with CDR=1 than subtypes C4, C14 and C16, estimated by the total number of outward transitions from the subtypes.

Iv Discussion and Conclusion

In this work, we aimed to parse heterogeneity within a dmentia clinic population by leveraging visit information of patients to identify subtypes by an unsupervised clustering approach. Subtypes even with the same Global CDR score have different risk of progressing to a more severe stage of dementia. Analyzing CDR component scores of the individual subtypes enables straightforward interpretation of subtype cognitive characteristics. Subtypes with early dementia (CDR = 0.5) have more cognitive variability compared to the ones in the later stages of dementia (CDR

). Future steps include developing a machine learning model based on the subtypes to predict personalized rate of progression of dementia patients and further analysis on how the subtypes are associated with dementia biomarkers, neuroimaging features and cognitive disorders.


  • [1] A. Dong, J. B. Toledo, N. Honnorat, J. Doshi, E. Varol, A. Sotiras, D. Wolk, J. Q. Trojanowski, C. Davatzikos, and A. D. N. Initiative (2017) Heterogeneity of neuroanatomical patterns in prodromal alzheimer’s disease: links to cognition, progression and biomarkers. Brain 140 (3), pp. 735–747. Cited by: §I.
  • [2] C. P. Ferri, M. Prince, C. Brayne, H. Brodaty, L. Fratiglioni, M. Ganguli, K. Hall, K. Hasegawa, H. Hendrie, Y. Huang, et al. (2005) Global prevalence of dementia: a delphi consensus study. The lancet 366 (9503), pp. 2112–2117. Cited by: §I.
  • [3] A. Likas, N. Vlassis, and J. J. Verbeek (2003) The global k-means clustering algorithm. Pattern recognition 36 (2), pp. 451–461. Cited by: §II.
  • [4] J. C. Morris (1991) The clinical dementia rating (cdr): current version and. Young 41, pp. 1588–1592. Cited by: §II.
  • [5] R. Tibshirani, G. Walther, and T. Hastie (2001) Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2), pp. 411–423. Cited by: §II.