I Introduction
Dementia is a set of progressive neurodegenerative disorders associated with memory loss, cognitive impairment, and general disability [2]. Dementia is highly heterogeneous, and the presence of different brain pathologies and variation in genetic background lead to significant variations in the clinical presentation and disease course. Hence, a heterogeneous group of cognitively impaired patients is composed of different subpopulations, each representing a specific disease course and characteristics [1]. In this research, we use an unsupervised data-driven clustering approach on the visit information of patients to identify sub-groups within the dementia cohort. Our aim is to analyze the dimensionality of the heterogenous dementia patient cohort and gain a better understanding of the relationship between the identified subtypes in terms of how individual patients progress through the different subtypes over time. Analyzing the cognitive profile of the subtypes can lead to effective clinical decision-making and precision diagnostics tailored to each subtype.
Ii Methods
Clinical data corresponding to office visits were extracted from the Electronic Health Records (EHR) of patients treated between June 2012 and May 2018 at the Memory Diagnostic Center (MDC) at the Washington University School of Medicine in St. Louis, a large, academic, tertiary-care referral center. Longitudinal data from 1,845 patients with 2,747 visits were eligible for inclusion, where each visit recorded a Global Clinical Dementia Rating (CDR) score. Global CDR is a 5-point scale (0, 0.5, 1, 2, 3) used to characterize 6 domains of cognitive and functional performance [4], with high scores indicating severe impairment. Compared to expensive and/or invasive procedures like neuroimaging biomarkers, the CDR score is a standard metric in dementia research and is recorded for all cognitively impaired patients in the MDC.
The six components of Global CDR: Memory, Orientation, Judgment and Problem Solving, Community Affairs, Home and Hobbies, and Personal Care were used as input for the unsupervised K-Means clustering algorithm to generate the subtypes (clusters)
[3]. Clustering analysis was performed on the 2,747 visits. This approach – using visits as opposed to patients - was taken to enable downstream longitudinal analysis and track the symptom progression rate among patients. For example, at any given point in time, a patient exists in a single cluster (subtype), but transitions between different clusters over time. The Gap-statistics algorithm was used to decide the optimal number of clusters [5].To gain an insight into which patients have a higher probability of progressing to a more severe stage in the CDR spectrum, patient transitions between different subtypes were analyzed across multiple visits. Patients with only a single visit were censored from our analysis. The differences in progression rate, both within and between Global CDR score categories were measured. The distribution of the six CDR components within each subtype provide validation and interpretability regarding the cognitive characteristics of the identified subtypes.
Iii Results
Figure 1 shows the t-distributed stochastic neighbor embedding (T-SNE) representations of the CDR components across all visits distributed across the Global CDR score categories and the same representations distributed into the subtypes (clusters). Figure 2 shows the number of visits and CDR composition of each subtype ordered by increasing CDR score (more severe dementia). Subtypes can either be homogenous (having a unique Global CDR) or composite, including two Global CDR scores. We can observe that there is greater variability in early dementia (CDR of 0.5 or 1) leading to more subtypes with a lower CDR score, compared to the later stages of the disease (CDR = 2 and 3).


The association between the six CDR components and the subtypes provides an intuitive interpretation of the six cognitive characteristics of each subtype (Figure 3). The early stages of dementia (CDR = 0.5 C4, C7, C11, C14, C15 and C16) have both intra- and inter subtype variability. For example, C4 and C14 have healthy orientation but slightly impaired memory. Patients with more severe dementia (CDR of 2 or 3) have less cognitive variability.
Figure 4
shows the transitions from CDR 0.5 to CDR 1 between different subtypes. Out of the 6 CDR = 0.5 subtypes, subtypes C7, C11 and C15 are more probable to progress to subtypes with CDR=1 than subtypes C4, C14 and C16, estimated by the total number of outward transitions from the subtypes. We hypothesize that these differences are related to the various subtypes having different underlying etiologies of dementia. The distribution of the Memory and Orientation components of the two categories (subtypes 7, 11 and 15 versus subtypes 4, 14, and 16) in Figure 3 suggest that the six components of Global CDR vary in terms of how they predict the risk of dementia progression.


Iv Discussion and Conclusion
In this work, we aimed to parse heterogeneity within a dmentia clinic population by leveraging visit information of patients to identify subtypes by an unsupervised clustering approach. Subtypes even with the same Global CDR score have different risk of progressing to a more severe stage of dementia. Analyzing CDR component scores of the individual subtypes enables straightforward interpretation of subtype cognitive characteristics. Subtypes with early dementia (CDR = 0.5) have more cognitive variability compared to the ones in the later stages of dementia (CDR
). Future steps include developing a machine learning model based on the subtypes to predict personalized rate of progression of dementia patients and further analysis on how the subtypes are associated with dementia biomarkers, neuroimaging features and cognitive disorders.
References
- [1] (2017) Heterogeneity of neuroanatomical patterns in prodromal alzheimer’s disease: links to cognition, progression and biomarkers. Brain 140 (3), pp. 735–747. Cited by: §I.
- [2] (2005) Global prevalence of dementia: a delphi consensus study. The lancet 366 (9503), pp. 2112–2117. Cited by: §I.
- [3] (2003) The global k-means clustering algorithm. Pattern recognition 36 (2), pp. 451–461. Cited by: §II.
- [4] (1991) The clinical dementia rating (cdr): current version and. Young 41, pp. 1588–1592. Cited by: §II.
- [5] (2001) Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2), pp. 411–423. Cited by: §II.