OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings

This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call closed clinical settings, but fail to work in real-world clinical settings. Compared to the diagnosis task in the closed setting, real-world clinical settings pose severe challenges, and we must treat them differently. We build a clinical AI benchmark named Clinical AIBench to set up real-world clinical settings to facilitate researches. We propose an open, dynamic machine learning framework and develop an AI system named OpenClinicalAI to diagnose diseases in real-world clinical settings. The first versions of Clinical AIBench and OpenClinicalAI target Alzheimer's disease. In the real-world clinical setting, OpenClinicalAI significantly outperforms the state-of-the-art AI system. In addition, OpenClinicalAI develops personalized diagnosis strategies to avoid unnecessary testing and seamlessly collaborates with clinicians. It is promising to be embedded in the current medical systems to improve medical services.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 33

05/08/2019

A new direction to promote the implementation of artificial intelligence in natural clinical settings

Artificial intelligence (AI) researchers claim that they have made great...
10/07/2019

Open Set Medical Diagnosis

Machine-learned diagnosis models have shown promise as medical aides but...
10/21/2019

Who wants accurate models? Arguing for a different metrics to take classification models seriously

With the increasing availability of AI-based decision support, there is ...
11/26/2020

Achievements and Challenges in Explaining Deep Learning based Computer-Aided Diagnosis Systems

Remarkable success of modern image-based AI methods and the resulting in...
02/27/2021

Lifelong Learning based Disease Diagnosis on Clinical Notes

Current deep learning based disease diagnosis systems usually fall short...
07/24/2020

Real-World Multi-Domain Data Applications for Generalizations to Clinical Settings

With promising results of machine learning based models in computer visi...
08/27/2021

Deep learning models are not robust against noise in clinical text

Artificial Intelligence (AI) systems are attracting increasing interest ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Due to previous successive successes of AI in the clinical research field, AI is considered a promising technology to provide high-quality and low-cost diagnostic services [1, 2, 3, 4, 5, 6, 7]. However, there is little evidence that these researches can be implemented into real-world clinical settings (in short, real-world settings) and improve medical services [8, 9, 10]. Fig. 123 qualitatively and quantitatively reveal the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions. We call those stringent conditions closed clinical settings (in short, closed settings). The closed settings have the following primary assumptions: all categories of subjects are known a priori [11]; the same diagnostic strategy is applied to all subjects, e.g., every subject requires a nuclear magnetic resonance scan (MRI) [12]; the state-of-the-art AI systems can only be deployed at medical institutions that are able to execute the pre-prescribed diagnostic strategy [4, 13, 14]. Vice versa, if the medical institution can not meet prerequisite conditions that are able to complete the pre-prescribed diagnostic strategy, the corresponding AI system can not be deployed. In this context, the diagnosis problem is a closed set recognition problem that is artificially simplified [3, 4, 14, 5, 15].

Close settings are too ideal for real-world settings. The real-world setting is open with uncertainty and complexity. The subject in real-world settings is not all pre-known categories but contains many unknown and unfamiliar categories. Every subject is different, and there is no one-size-fits-all diagnosis strategy. Conditions of medical institutions are different and not pre-known, e.g., some hospitals have positron emission tomography (PET). In contrast, most of the other hospitals in underdeveloped areas are not equipped with PET. The diagnosis problem in real-world settings is an open set recognition problem [16].

Essentially, the diagnosis task in the closed setting is to find the optimal solution to classify different categories of subjects in a limited space (so-called supervised task) with the help of the ground truth of every subject. However, the real-world setting is open and puts the diagnosis task into unlimited space. Compared to the limited space of closed settings, the infinite space of real-world settings infinitely expands the scale of solving. Moreover, supervised learning will lose efficacy since some categories of subjects and their ground truth are unknown during the development of the AI model. Hence, the main problem of the diagnosis task in the real-world setting converts to efficiently locate the known subjects from the uncertain and complex real-world setting. Moreover, as shown in Fig. 

2a and 3b,c, solving well the diagnosis task in the closed setting is not much help to solve the diagnosis task in the real-world setting. Compared to the diagnosis task in the closed setting, the diagnosis task in the real-world setting is a new and challenging task that we must treat differently.

This paper calls for turning medical AI attention from algorithmic research in closed settings to systematic study in real-world settings. Specifically, we construct a clinical AI benchmark named Clinical AIBench, which contains real-world and closed settings to promote the landing of AI in real-world settings. To tackle uncertainty and complexity in real-world settings, we propose an open, dynamic machine learning framework ( Fig. S1) and a diagnostic system named OpenClinicalAI to embed in the current healthcare systems as shown in Fig. 1b.

The first versions of Clinical AIBench and OpenClinicalAI target Alzheimer’s disease (AD) as AD is an incurable disease that brings a heavy burden to our society (the total payment for individuals with AD or other dementias is estimated at

billion) [17, 18, 19, 20]. Early and accurate AD diagnosis will result in the correct management of AD or other dementias, saving up to trillion in medical and care costs [19]. However, it is estimated that million of the world’s million people with dementia do not receive a diagnosis since the limited medical resources and experts, etc. [21].

The current version of Clinical AIBench includes two clinical settings, which are curated from a large enriched dataset Alzheimer’s disease neuroimaging initiative (ADNI): a closed setting and a real-world setting [22]. OpenClinicalAI is composed of multiple independent parts, which can cooperate to handle unknown subjects in real-world settings, and dynamically adjust diagnosis strategies according to specific subjects and medical institutions. OpenClinicalAI provides an opportunity to embed the AI-based diagnostic system into the current healthcare systems to cooperate with clinicians to improve healthcare services.

In the real-world setting of Clinical AIBench, we evaluate the performance of OpenClinicalAI against the state-of-the-art AI diagnosis system. Our evaluations show that the performance of OpenClinicalAI exceeds that of the state-of-the-art AI diagnosis system in the real-world setting. Additionally, OpenClinicalAI can develop personalized diagnosis strategies for every subject in the real-world setting, maximizing the patient benefit.

Results

Clinical AIBench

Clinical AIBench contains real-world and closed settings to develop and evaluate the AI system designed for real-world settings. The first version targets Alzheimer’s disease. In this section, we focus on real-world settings.

The diagnosis in a real-world setting requires clinicians to use both individual clinical expertise and the best available external evidence, which is usually obtained by clinical examination, to make a clinical decision for every specific subject [23]. It means that at least two main factors must be considered in the diagnosis task in real-world settings: the subject and the available clinical examination in the medical institution.

As shown in Fig. S2, the real-world setting is open with uncertainty and complexity. The primary characteristics of the real-world setting are as follows:

  • Real-world settings are open, and clinicians or AI systems often refer to unknown and unfamiliar categories. Thus, the subject’s categories are not all pre-known and familiar. A clinician has different expertise and may be unfamiliar with some diseases. In the real-world setting of Clinical AIBench, an unknown subject category means that it is not familiar to the clinician or AI system. Thus, we mark both unknown categories and unfamiliar categories as unknown. In this work, Clinical AIBench divides all mild cognitive impairment (MCI) and significant memory concern (SMC) subjects into the test set, which are unknown categories during the development of the AI system.

  • Subjects in real-world settings are under different situations. In this work, subjects with varying conditions are from 67 sites in two countries ( Table S1). For every subject, data of all visits are included in Clinical AIBench ( Table S2). The interval between two contiguous visits of a subject is usually more than six months.

  • Medical institutions in real-world settings have wildly different executive abilities of the examination. Not all the specific medical institutions and their specific executive abilities of the examination are pre-known. In this work, missing data for subjects are not be filled in the real-world setting of Clinical AIBench. In the real world, most of the subjects do not have all examination data categories. The purpose of the lack of specific category examination data is to keep the varied executive ability of the examination in different medical institutions. That is to say, in the real-world setting of Clinical AIBench, the lack of specific category examination data indicates that a medical institute lacks that examination ability.

Specifically, in this work, the examination data in ADNI is divided into 13 categories: base information (Base), cognition information (Cog), cognition testing (CE), neuropsychiatric information (Neur), function and behavior information (FB), physical neurological examination (PE), blood testing (Blood), urine testing (Urine), nuclear magnetic resonance scan (MRI), positron emission computed tomography scan with 18-FDG (FDG), positron emission computed tomography scan with AV45 (AV45), gene analysis (Gene), and cerebral spinal fluid analysis (CSF).

Details of the dataset in the real-world setting are as follows.

  • All subjects with labels in ADNI are included.

  • 85% AD and cognitively normal (CN) subjects are divided as the training set. 5% of AD and CN subjects are divided as the validation set. 20% AD and CN subjects, 100% MCI subjects, and 100% SMC are divided as the test set.

  • For every subject, different diagnosis strategies are combined according to the presence of different examination data, and the data of each diagnosis strategy forms a sample.

The test set is not accessible during the training of the AI system. In addition, since each subject may have multiple visits ( each visit of the subject is treated as an independent subject), we stipulate that each subject’s visit data can only appear in one of the training set, validation set, and test set.

Sine previous AD diagnosis researches are developed in closed settings, the closed setting in Clinical AIBench is similar to the previous research [24, 25, 26, 27, 28, 12, 29, 30, 31]. Only AD and CN subjects are included in the closed setting, and only the nuclear magnetic resonance instrument and historical medical records are available. 80% of subjects are divided as the training set, 5% of subjects are divided as the validation set, and 15% of subjects are divided as the test set.

The performance of OpenClinicalAI on Alzheimer’s disease diagnosis

Ebrahimighahnavieh et al. and Tanveer et al. review many important works of Alzheimer’s disease diagnosis [27, 28]

. Most of these works are based on MRI data and transfer learning obtain the most excellent results. In addition, among the recent AI diagnosis researches, the transfer learning framework of the pre-trained model followed by a classifier achieves the state-of-the-art performance in many diagnosis tasks based on medical images 

[14, 1, 32, 33, 3]. Thus, based on the state-of-the-art transfer learning framework and MRI data, we utilize a trained model named DenseNet201 [34] and a classifier called XGBoot [35] to develop an Alzheimer’s disease diagnosis AI system, which we consider as the baseline system to compare against OpenClinicalAI in the rest of this paper.

We validate the effectiveness of OpenClinicalAI in two ways. First, we compare OpenClinicalAI to the baseline system in the closed setting. Second, we compare OpenClinicalAI to the baseline system in the real-world setting. Our comparison metrics are the area under the receiver operating characteristic (ROC) curve (AUC) and sensitivity. The larger the value of AUC and sensitivity are, the better the AI system is.

The performance of OpenClinicalAI against the baseline system in the closed setting.

To the best of our knowledge, all state-of-the-art and state-of-the-practice Alzheimer’s disease diagnosis AI researches are developed and evaluated in closed settings [27, 28, 12, 29, 30, 31]. We firstly assess the baseline AI system in the closed setting, and then evaluate OpenClinicalAI in the same closed setting without the limitation of that only the nuclear magnetic resonance instrument and historical medical records are available.

As shown in Fig. 2 a, the baseline system obtains a high AUC score of

(95% confidence interval (CI) 0.9722-0.9827), and there is not much room for promotion. OpenClinicalAI achieves an AUC score of

(95% CI 0.9907-0.9945) and obtains the state-of-the-art performance. However, the essential improvement from the baseline system to OpenClinicalAI is that the latter can dynamically develop personalized diagnosis strategies according to specific subjects and medical institutions. As shown in Fig. 2 b, less than 10% of the subjects require a nuclear magnetic resonance scan, and most of the subjects only require harmless examination such as cognitive examination. We conclude OpenClinicalAI can avoid unnecessary examination for subjects and suit medical institutions with different examination abilities 222Different hospitals have various clinical settings, such as community hospitals without nuclear magnetic resonance machines, big hospitals with multiple facilities..

The performance of OpenClincalAI against the baseline system in the real-world setting.

Our goal is to develop an AI diagnosis system that can be embedded in the current medical system and cooperated with clinicians. In this work, if the predicted probability of the AD or CN is smaller than the probability threshold ( 0.95 ), the subject will be marked as unknown and referral to the clinician. For comparison, we use the same baseline system discussed above. In addition, we also consider OpenClinicalAI without an OpenMax mechanism ( Algorithm S2,3) as the comparison system 

[11].

As shown in Fig. 3a, b, and c, compared to the baseline system, OpenClinicalAI demonstrates a significant improvement in the AUC of identification of AD subjects (+0.1102) and the AUC of identification of CN subjects (+0.1148). It is worth noting that OpenClinicalAI has a vast improvement in the sensitivity of AD, CN, and unknown on the operating point.

For the baseline system, the sensitivity of known (AD and CN) subjects is low. The sensitivity of AD is just 0.5483 (95% CI 0.4604-0.6301), and the sensitivity of CN is just 0.3333(95% CI 0.2663-0.3979). It indicates that most known subjects will be marked as unknown and sent to the clinician for diagnosis. Moreover, the sensitivity of unknown subjects is 0.8888(95% CI 0.8753-0.9018), meaning 11.12% of unknown subjects will be misdiagnosed. In addition, the baseline system requires that every subject has a nuclear magnetic resonance scan, and every medical institution that deploys the baseline system must be equipped with a nuclear magnetic resonance apparatus.

For OpenClinicalAI without an OpenMax mechanism, the sensitivity of known (AD and CN) subjects is as good as OpenClinicalAI with an OpenMax mechanism. In contrast, the sensitivity of unknown subjects is much worse than OpenClinicalAI with an OpenMax mechanism. It means most unknown subjects will be misdiagnosed, and it is unendurable in real-world settings.

OpenClinicalAI diagnoses most of the known (AD and CN) subjects correctly, marks most of the rest as unknown, and sends them to the clinician for further diagnosis. Besides, most unknown subjects are correctly identified, and the misdiagnosis of unknown subjects is only . It means that OpenClinicalAI has enormous potential application value to implement in real-world settings. In addition, as shown in Fig. 3d, similar to the behaviors of OpenClinicalAI in the closed setting, OpenClinicalAI can develop and adjust diagnosis strategies for every subject dynamically in the real-world setting. Only a small part of subjects require a nuclear magnetic resonance scan and more costs (economy and harm) examinations.

Development of diagnosis strategies

For every subject, firstly, OpenClinicalAI will acquire the base information of the subject. Secondly, OpenClinicalAI will give a final diagnosis or receive other examination information according to the current data of the subject. Thirdly, repeat the previous step until the diagnosis is finalized or there is no further examination.

As shown in Fig. 4a, diagnosis strategies of subjects are not the same ( Table S3). OpenClinicalAI dynamically develops 35 diagnosis strategies according to different subject situations and all 40 examination abilities in the test set( Table S4). For the known (AD and CN) subjects, as shown in Fig. 4b, and c, most of the subjects require low-cost examinations (such as cognition examination (CE)). A small part of subjects requires high-cost examinations (such as cerebral spinal fluid analysis (CSF) ). For unknown subjects, as shown in Fig. 4d, different from the diagnosis of known (AD and CN) subjects, identifying unknown subjects is more complex and more dependent on high-cost examinations. The reason for the above phenomenon is that according to the mechanism of OpenClinicalAI, it will do its best to distinguish whether the subject belongs to the known categories. When it fails, OpenClinicalAI will mark the subject as unknown. It means that the unknown subject will undergo more examinations than the known subject. The details of the high-cost examinations requirement are as follows.

  • 33.94% of unknown subjects require a nuclear magnetic resonance scan (that of the known subject is 12.43%).

  • 13.95% of unknown subjects require a positron emission computed tomography scan with 18-FDG ( that of the known subject is 4.75%).

  • 8.67% of unknown subjects require a positron emission computed tomography scan with AV45 ( that of the known subject is 5.87%).

  • 9.38% of unknown subjects require a gene analysis ( that of the known subject is 1.96%).

  • 5.13% of unknown subjects require a cerebral spinal fluid analysis (that of the known subject is 0.28%).

Potential clinical applications

OpenClinicalAI enables that the AD diagnosis system can be implemented in uncertain and complex clinical settings to reduce the workload of AD diagnosis and minimize the cost of subjects.

To identify the known (AD and CN) subject with high confidence, the operating point of OpenClinicalAI is running with a high decision threshold (0.95). For the test set, OpenClinicalAI achieved a accuracy value of 92.47% (95% CI 91.36%-93.44%), AD sensitivity value of 84.92% (95% CI 78.91%-90.51%), CN sensitivity value of 81.27% (95% CI 75.51%-86.67%) while retaining an unknown sensitivity value of 93.96% (95% CI 92.90%-94.92%). In addition, OpenClinicalAI can cooperate with the senior clinician to identify the known subject. In this work, 15.08% (95% CI 9.49%-21.09%) of AD subjects and 18.73% (95% CI 13.33%-24.49%) of CN subjects are marked as unknown and sent to senior clinicians to diagnose. The work pattern is significant for the undeveloped area, which is a promising way to connect developed areas and undeveloped areas to reduce the workload, improve the overall medical services, and promote medical equity. To minimize the subject cost and maximize the subject benefit, OpenClinicalAI dynamically develops personalized diagnosis strategies for the subject according to the subject’s situation and existing medical conditions.

For the subject, OpenClinicalAI will judge whether it can finalize the subject’s diagnosis according to the currently obtained information of subjects. If the current data of the subject is not enough to support OpenClinicalAI to make a diagnosis, it will recommend the most suitable further examination for the subject. It will mitigate the over-testing plight, minimize the subject cost, and maximize the subject benefits. For the test set, different diagnosis strategies are applied to the subject by OpenClinicalAI ( Table S3). The details of the high-cost examination are as follows.

  • 31.07% of subjects require a nuclear magnetic resonance scan.

  • 12.72% of subjects require a positron emission computed tomography scan with 18-FDG.

  • 8.29% of subjects require a positron emission computed tomography scan with AV45.

  • 8.39% of subjects require a gene analysis.

  • 4.48% of subjects require a cerebral spinal fluid analysis.

For the medical institution, before the system recommends an examination for a subject, OpenClinicalAI will inquire whether the medical institution can execute the examination. Suppose the medical institution cannot perform the examination. In that case, OpenClinicalAI will recommend other examinations until the current information of the subject is enough to support it to make a diagnosis or until all common examinations have been suggested and the subject is marked as unknown. It enables that OpenClinicalAI is able to deploy in the different medical institutions with various examination abilities. In this work, OpenClinicalAI diagnoses subjects on 40 conditions of medical institutions ( Table S4). In addition, for the subject of the test set, due to lack of the information of recommended examinations (which may be equal to the medical institution not having the ability to execute the recommended examination), OpenClinicalAI adjusts the diagnostic strategies times.

Discussion

Currently, the media overhype the AI assistance diagnosis system. However, it is far from being mature to be implemented in real-world clinical settings. Many clinicians are gradually losing faith in the medicine AI  [36, 37, 9, 38, 39, 40]. Similar to the first trough of AI, the high expectation and unsatisfactory practical implementation of medical AI may severely hinder the development of medical AI. In addition, compared performances of state-of-the-art AI systems on stringent conditions and real-world settings, solving well the diagnosis task on stringent conditions is not much help to solve the diagnosis task in the real-world setting. It is time to draw the attention from the pure algorithm research in closed settings to systematic study in real-world settings, focusing on the challenge of tackling the uncertainty and complexity of real-world settings. In this work, we propose an open, dynamic machine learning framework to make the AI diagnosis system can directly deal with the uncertainty and complexity in the real-world setting. Based on our framework, an AD diagnostic system demonstrates huge potentiality to implement in the real-world setting with different medical environments to reduce the workload of AD diagnosis and minimize the cost of the subject.

Although many AI diagnostic systems have been proposed, how to embed these systems into the current health care system to improve the medical service remains an open issue [2, 41, 42, 43]. OpenClinicalAI provides a reasonable way to embed the AI system into the current health care system. OpenClinicalAI can collaborate with clinicians to improve the clinical service quality, especially the clinical service quality of undeveloped areas. On the one hand, OpenClinicalAI can directly deal with the diagnosis task in the uncertain and complex real-world setting. On the other hand, OpenClinicalAI can diagnose typical patients of known subjects, while sending those challenging or atypical patients of known subjects to the clinicians for diagnosis. Although AI technology is different from traditional statistics, the model of the AI system still learns patterns from training data. For typical patients, the model is easy to understand patterns from patients, while it is challenging to learn patterns for atypical patients. Thus, every atypical and unknown patient is needed to treat by clinicians especially. In this work, most of the known subjects are diagnosed by OpenClinicalAI, and the rest are marked as unknown and sent to the senior clinician.

Over-testing has always been a concern and has been exacerbated in current AI-based diagnostic systems [44, 45]. As samples, the systems proposed by Lu et al., Ding et al., and Liu et al. achieved state-of-the-art performance. At the same time, they required every subject to have a positron emission computed tomography scan, which is unnecessary for most subjects in real-work settings [46, 31, 47]. However, OpenClinicalAI enables AI systems able to develop personalized diagnosis strategies to avoid unnecessary testing. OpenClinicalAI provides a possible way that can effectively reduce over-testing under strict supervision.

Notably, the experiment of this work does not contain a comparison with clinicians. There are two main reasons. First, OpenClinicalAI obtains an AUC value of 0.9927 (95% CI 0.9854-0.9981) in the closed setting. It is very close to the ground truth and unnecessary compared to clinicians. Second, the diagnosis patterns in real-world settings aim to diagnose typical patients of known subjects (which is usually easier to diagnose) and send atypical patients of known subjects ( which are generally difficult to diagnose) and unknown subjects to clinicians. The task of OpenClinicalAI is quite different from that one of clinicians. Unlike current AI-based diagnostic systems, OpenClinicalAI performs as a new part of the whole healthcare system instead of replacing the role of clinicians. Therefore, it is not necessary to compare OpenClinicalAI to clinicians.

Although OpenClinicalAI is promising to impact the future research of the diagnosis system, several limitations remain. First, the prospective clinical studies of diagnosis of Alzheimer’s disease will be required to prove the effectiveness of our system. Second, the data of collection and processing are required to follow the standards of ADNI.

References

  • [1] A. Esteva, et al., Nature 542, 115 (2017).
  • [2] S. M. McKinney, et al., Nature 577, 89 (2020).
  • [3] D. S. Kermany, et al., Cell 172, 1122 (2018).
  • [4] J. De Fauw, et al., Nature medicine 24, 1342 (2018).
  • [5] K. Ning, et al., Neurobiology of aging 68, 151 (2018).
  • [6] Z. Tang, et al., Nature communications 10, 1 (2019).
  • [7] C. Lian, M. Liu, Y. Pan, D. Shen, IEEE Transactions on Cybernetics (2020).
  • [8] J. He, et al., Nature medicine 25, 30 (2019).
  • [9] P. Brocklehurst, et al., The Lancet 389, 1719 (2017).
  • [10] M. Roberts, et al., Nature Machine Intelligence 3, 199 (2021).
  • [11] A. Bendale, T. Boult,

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2015), pp. 1893–1902.
  • [12] S. Qiu, et al., Brain 143, 1920 (2020).
  • [13] J. J. Titano, et al., Nature medicine 24, 1337 (2018).
  • [14] H. Lee, et al., Nature biomedical engineering 3, 173 (2019).
  • [15] X. Mei, et al., Nature medicine 26, 1224 (2020).
  • [16] C. Geng, S.-j. Huang, S. Chen, IEEE transactions on pattern analysis and machine intelligence (2020).
  • [17] L. E. Hebert, L. A. Beckett, P. A. Scherr, D. A. Evans, Alzheimer Disease & Associated Disorders 15, 169 (2001).
  • [18] L. E. Hebert, J. Weuve, P. A. Scherr, D. A. Evans, Neurology 80, 1778 (2013).
  • [19] A. Association, et al., Alzheimer’s & Dementia 14, 367 (2018).
  • [20] C. S. Frigerio, et al., Cell reports 27, 1293 (2019).
  • [21] M. Prince, R. Bryce, C. Ferri (2018).
  • [22] S. G. Mueller, et al., Neuroimaging Clinics 15, 869 (2005).
  • [23] D. L. Sackett, W. M. Rosenberg, J. M. Gray, R. B. Haynes, W. S. Richardson, BMJ 312, 71 (1996).
  • [24] H. Li, et al., Alzheimer’s & Dementia (2019).
  • [25] H. Choi, et al., EBioMedicine 43, 447 (2019).
  • [26] T. Zhou, M. Liu, K.-H. Thung, D. Shen, IEEE transactions on medical imaging (2019).
  • [27] M. A. Ebrahimighahnavieh, S. Luo, R. Chiong, Computer methods and programs in biomedicine 187, 105242 (2020).
  • [28] M. Tanveer, et al., ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 1 (2020).
  • [29] R. Sharma, T. Goel, M. Tanveer, S. Dwivedi, R. Murugan, Applied Soft Computing 106, 107371 (2021).
  • [30] M. Tanveer, et al., IEEE Journal of Biomedical and Health Informatics (2021).
  • [31] Y. Ding, et al., Radiology 290, 456 (2019).
  • [32] P. Tschandl, et al., Nature Medicine 26, 1229 (2020).
  • [33] R. Poplin, et al., Nature Biomedical Engineering 2, 158 (2018).
  • [34] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 4700–4708.
  • [35] T. Chen, C. Guestrin, Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016), pp. 785–794.
  • [36] M. van Assen, L. J. Cornelissen, Jacc-Cardiovascular imaging 13, 1172 (2020).
  • [37] C. G. Weaver, F. A. McAlister, Canadian Journal of Cardiology (2021).
  • [38] J. H. Chen, S. M. Asch, The New England journal of medicine 376, 2507 (2017).
  • [39] T. M. Maddox, J. S. Rumsfeld, P. R. Payne, Jama 321, 31 (2019).
  • [40] H. T. Head, Bmj p. 363 (2018).
  • [41] C.-Y. Kuo, H.-M. Chiu, Journal of Gastroenterology and Hepatology 36, 267 (2021).
  • [42] J. Schneider, M. Agus, arXiv preprint arXiv:2103.01149 (2021).
  • [43] J. Bullock, A. Luccioni, K. H. Pham, C. S. N. Lam, M. Luengo-Oroz,

    Journal of Artificial Intelligence Research

    69, 807 (2020).
  • [44] M. O’Keeffe, et al., JAMA Internal Medicine 181, 865 (2021).
  • [45] J. W. O’Sullivan, et al., BMJ open 8, e018557 (2018).
  • [46] D. Lu, et al., Medical image analysis 46, 26 (2018).
  • [47] M. Liu, D. Cheng, W. Yan, A. D. N. Initiative, et al., Frontiers in neuroinformatics 12, 35 (2018).

Acknowledgments

We thank Weibo Pan and Fang Li for downloading the raw data sets from Alzheimer’s Disease Neuroimaging Initiative. Funding: This work is supported by the Project of Guangxi Science and Technology (No. GuiKeAD20297004 to Y. H.) and the National Natural Science Foundation of China (No.61967002 to S. T.). Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (https://www.fnih.org/). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Author contributions: Y.H. conceptualized the study, designed the models, wrote the codes, collected and analyzed the data, and wrote the manuscript. N.W., S.T., L.M, T.H., and Z.J. conceptualized the study and revised the manuscript. F.Z., G.K., X.M, X.G, and R,Z. collected and analyzed the data. Z.Z., and J.Z. directed the project and revised the manuscript. Competing interests: The authors declare no competing financial interest. Data and materials availability: The data from Alzheimer’s Disease Neuroimaging Initiative was used under license for the current study. Applications for access to the dataset can be made at http://adni.loni.usc.edu/data-samples/access-data/. All original code has been deposited at the website BenchCouncil and is publicly available as of the date of publication.

Supplementary materials

Materials and Methods
Figs. S1 to S3
Tables S1 to S6
Algorithms S1 to S4
References (48-70)

Figures

a b
Fig. 1: The workflow of the baseline clinical AI system and OpenClinicalAI. a, The workflow of the mainstream AI-based diagnostic systems for closed settings. The system only accepts subjects with pre-specified clinical states. First, the same pre-specified medical examinations will be executed by a medical institution with the pre-specified examination ability for every subject. And then, the system will calculate the probability of each pre-defined clinical state for the subject according to the examination. Finally, the system will take the clinical states with the maximum probability as the output and make the final diagnosis. b, The workflow of OpenClinicalAI. It can deal with different categories of subjects, including the unfamiliar and unknown categories of subjects during the development of the system. It can deploy in various medical institutions with different examination abilities from small-scale country clinics to large-scale hospitals. First, OpenClinicalAI will obtain the basic information of the subject and combine the history clinical information of the subject as input. Second, according to the input, OpenClinicalAI calculates the probability of each disease-related examination and each pre-defined clinical state, including the unknown clinical state. Third, for each pre-defined clinical state, if a clinical state’s possibility is greater than the specific threshold, then the clinical state is the final diagnosis of OpenClinicalAI, which will be sent to clinicians to review. Otherwise, go to the next step. Fourth, for each examination, if the probability of an examination is greater than the specified threshold and the medical institution can execute this examination, then obtain the examination data, add the data to the input of OpenClinicalAI, and go to step two. Otherwise, go to the next step. Fifth, for the medical institution with specific examination ability, select an executable routine examination with the least cost that has not been executed for the subject, add the examination data to the input of OpenClinicalAI, and go to step two. Otherwise, go to the next step. Finally, mark the subject without diagnosis as unknown and send them to clinicians to diagnose. Notably, the atypical subject with specified clinical states, unfamiliar and unknown subjects are marked as unknown and sent to the clinician for diagnosis.
a b
Fig. 2: The performance of OpenClinicalAI with personalized strategies against the baseline system on Alzheimer’s disease diagnosis task in the closed setting. a, The ROC curves of two systems in the closed dataset. The red curve is the ROC curve of the baseline system, and it obtains an AUC score of (95% CI 0.9722-0.9827). The black curve is the ROC curve of OpenClinicalAI with various examination data, and it obtains an AUC score of (95% CI 0.9907-0.9945). b, The examination used during the AD diagnosis process. The baseline system consistently uses MRI data and historical data as the system input. In other words, every subject must have a nuclear magnetic resonance scan. OpenClinicalAI is able to develop and adjust the diagnosis strategies according to individual conditions and existing medical conditions during the diagnosis process, and only subjects in the test set should have a nuclear magnetic resonance scan. Most subjects only need to have two or several simple examinations during the diagnosis process.
a b
c d
Fig. 3: The performance of OpenClinicalAI with personalized strategies against the baseline system in the real-world setting. a, The ROC curves of OpenClinicalAI. It obtains two high AUC scores of 0.9502 (95% CI 0.9304-0.9662) and 0.9927 (95% CI 0.9854-0.9981) for AD and CN detection. The operating point of AI system is a group of score thresholds that separates positive and negative decisions of every category of the subject (0.95 for AD, 0.95 for CN, and 0.8 for unknown). On the operating point, OpenClinicalAI obtains the sensitivity of AD, CN, and unknown are (95% CI 0.7891-0.9051), (95% CI 0.7551-0.8667), and (95% CI 0.9290-0.9492) respectively. b, The ROC curves of OpenClinicalAI without an OpenMax mechanism. It obtains two AUC scores of 0.8795 (95% CI 0.8540-0.9038) and 0.8571 (95% CI 0.8331-0.8797) for AD and CN detection. On the operating point, the sensitivity of AD and CN are (95% CI 0.8282-0.9324) and (95% CI 0.8100-0.9056), respectively. However, the sensitivity of the unknown is only (95% CI 0.3133-0.3528). c, The ROC curves of the baseline system. The baseline system obtains two AUC scores of 0.8400 (95% CI 0.8055-0.8728) and 0.8779 (95% CI 0.8506-0.9025) for AD and CN detection. On the operating point, the unknown’s sensitivity is (95% CI 0.8753-0.9018). However, the sensitivity of AD and CN are only (95% CI 0.4604-0.6301) and (95% CI 0.2663-0.3979), respectively. d, The examination used during the AD diagnosis process. All subjects diagnosed by the baseline system require the nuclear magnetic resonance scan. The subject diagnosed by OpenClinicalAI without an OpenMax mechanism is similar to the subject diagnosed by OpenClinicalAI with an OpenMax mechanism. The selection of examination depends on the situation of the subject and existing medical conditions. Thus the examination number is not fixed.
a b
c d
Fig. 4: Diagnosis strategies for subjects. a, Diagnosis strategies for all subjects. Due to OpenClinicalAI developing and adjusting the examination for each subject, the selection of examinations for subjects is not the same. b, Diagnosis strategies for AD subjects. Compared to the high-cost examination, OpenClinicalAI pays more attention to the subject’s basic information, cognitive, mental, behavioral, and physical examination information for the AD subject. In contrast, biochemical testing, imaging, and genetic data are less considered. c, Diagnosis strategies for CN subjects. The behaviors of OpenClinicalAI for CN recognition are similar to those for AD diagnosis, and the difference between those behaviors is that more examinations are required to identify the CN subject. d, Diagnosis strategies for unknown (MCI and SMC) subjects. Compared to the known subject recognition, identifying unknown subjects is more complicated, and more examinations are required.

Supplementary Materials for

OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings

Materials and Methods

Human subjects

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner,MD. For up-to-date information, see http://www.adni-info.org.

The data is collected from 67 sites in the United States and Canada  [48, 49, 50, 51]. The subject in the dataset aged between and at the first visit. The interval of the subject follow-up is usually greater than months. Generally, the longer the follow-up time is, the longer the interval is. The first visit is marked as bl, and the other visit is marked as mxx according to the time (For example, the visit takes place six months after the first visit is marked as m06). Detailed characteristics of the subject are shown in Table S1,2.

Dataset

The data contains study data, image data, genetic data compiled by ADNI between 2005 and 2019. Considering the commonly used examinations and the concerned examinations in AD diagnosis by the clinician, 13 categories of data are selected.

  • Base information, usually obtained through consultation, includes demographics, family history, medical history, symptoms.

  • Cognition information, usually obtained through consultation and testing, includes Alzheimer’s Disease Assessment Scale, Mini-Mental State Exam, Montreal Cognitive Assessment, Clinical Dementia Rating, Cognitive Change Index.

  • Cognition testing, usually obtained through testing, includes ANART, Boston Naming Test, Category Fluency-Animals, Clock Drawing Test, Logical Memory-Immediate Recall, Logical Memory-Delayed Recall, Rey Auditory Verbal Learning Test, Trail Making Test.

  • Neuropsychiatric information, usually obtained through consultation, includes Geriatric Depression Scale, Neuropsychiatric Inventory, Neuropsychiatric Inventory Questionnaire.

  • Function and behavior information, usually obtained through consultation, includes Function Assessment Question, Everyday Cognitive Participant Self Report, Everyday Cognition Study Partner Report.

  • Physical, neurological examination, usually obtained through testing, includes Physical Characteristics, Vitals, neurological examination.

The rest of the examinations include blood testing, urine testing, nuclear magnetic resonance scan, positron emission computed tomography scan with 18-FDG, positron emission computed tomography scan with AV45, gene analysis, and cerebral spinal fluid analysis. It is worth noting that not all categories of information are obtained for a subject’s visit, and the information on each type is often incomplete.

All subjects with labels containing at least one of the above categories of information are considered in this study. Two thousand one hundred twenty-seven subjects with 9593 visits are included in our work. A subject in a visit may require different categories of examination. Every combination of those examinations represents a diagnosis strategy. Thus, for the subject, strategies are generated. These AD and CN subjects are randomly assigned to the training, validation, and test set. The training set contains 1025 subjects with 3986 visits and generates 180682 strategies. In the training set, 587 subjects with 1781 visits are AD and develop 80022 strategies, 466 subjects with 2205 visits are CN, and generate 100660 strategies. The validation set contains 73 subjects with 254 visits and generates 11898 strategies. In the validation set, 44 subjects with 127 visits are AD and develop 6008 strategies, 31 subjects with 127 visits are CN, and generate 5890 strategies. The test set contains 1460 subjects with 5353 visits. In the test set, 109 subjects with 305 visits are AD, 92 subjects with 411 visits are CN, 1082 subjects with 4357 visits are MCI, 280 subjects with 280 visits are SMC. Notably, the label of a subject may be different in other visits.

Randomization and blinding.

AD and CN subjects as known categories of subjects are randomized into training, validation, and test sets by applying a random function provided by the Python3 tool. The assignment is determined by a float value generated by a random function. We assign subjects whose values are [0,0.8) into the training set, assign subjects whose values are [0.8,0.85) into the validation set, assign subjects whose values are [0.85,1] into the test set. The data of visits belong to the same subject are only allowed to appear in the same set. MCI and SMC subjects as unknown categories of subjects are directly into the test set. During the development of the AI system, the test set is inaccessible.

Data preparation.

For each category of study data, if it contains more than one sub-category of data, concatenate all of the sub-category data by RID (The ID of the subject) and VISCODE (The mark of the subject’s visit). For the medical image, we first convert the data from the DICOM format to the NIfNI format by the dcm2nii library. Second, register the image by ant library [52, 53, 54]. Third, convert the 3D image to 2D slices and convert the image from gay to RGB. Finally, a trained model named DenseNet201 is used to extract the features of the 2D slices [34]. For the genetic data, we extract 70 single nucleotide polymorphisms (SNP), which are very relating to the AD ( Table S5), and use one-hot code to represent each SNP [55, 56, 57]. This work proposes a unified data representation framework, since the different dimensions of each category of data, the number of data categories included in each visit is different, and the number of history visits included in each subject is also different. We present an examination category in the subject’s visit by an array with a shape of . The shape of our data is , is the number of categories of data for the subject ( Fig. S3).

The propose model

Our model consists of five parts: , , , , and ( Fig. S1 ). We name the model consisting of , , and as , which can identify the subject from open clinical settings [16, 58, 11]. We name the model consisting of , and as , which can dynamically develop and adjust the diagnosis strategy according to the situation of subjects and existing medical conditions.

Loss function

The is a multi-task learning model, which simultaneously optimizes the model’s disease diagnosis and data reconstruction ability. The data reconstruction task can improve the diagnosis ability of the model in the open world [58]

. The loss function of the model is

. The is categorical cross-entropy, and the is mean squared logarithmic error. The is also a multi-task learning model, which simultaneously optimizes the 12 examinations whether should be selected as the next examination for the subject. We introduce a loss function that combines the BCE loss function and weighs losses with uncertainty [59, 14]. The modified loss function is given by equations 1:

(1)

where is the total number of examinations as the subsequent examination, is the total number of other examinations as the following examination. is an observation noise scalar of the output of examination [59].

Label examination

Although researchers have made many efforts on the interpretability and internal logic of deep learning, the current behavior of deep learning is still tricky to understand 

[60, 61]. We do not know whether the diagnosis strategy of the AI model needs to be consistent with human experts. Thus, it is unnecessary to label the subsequent examination of the current examination strategy by the clinician and train a model to simulate the clinician’s behavior. In this work, the following examination label is labeled by the examination label algorithm ( Algorithm S1 ). The subsequent examination for the subject is determined by whether this examination makes the prediction model () obtain a greater predicted probability for the correct category and smaller predicted probabilities for other categories.

OpenMax

OpenMax is a modified SoftMax layer that adopted the concept of Meta-Recognition 

[62, 11, 63]

. OpenMax uses the distance between the activation vector (AV) of the sample and the mean activation vector (the mean computed over only the correctly classified training examples) to identify the unknown categories of the subject  

[11]. The deep learning network can be regarded as a feature extractor, and the output of the AV layer can be regarded as characteristics of the sample. However, the AV layer usually only retains the most relevant features to the classification task, and the features related to the unknown category are not guaranteed to be retained. To alleviate this problem, we replaced the output of the AV layer with the abnormal patterns of 14 selected indicators of known categories according to the Alzheimer’s Diagnosis guidelines to improve the performance of the AI model [64, 65, 66, 67] ( Table S6 ). The modified OpenMax by abnormal patterns is shown in Algorithm S2,3.

Model training.

The training of our model consists of two stages. The first stage is training the , in which the uses SoftMax layer as the output layer. The dimension of the output of the in this training stage is 2, corresponding to AD and CN. After training the , a modified OpenMax layer, which estimates the probability of an input being an unknown class, is used to replace the SoftMax layer [11]. The dimension of the output of in the prediction stage is 3, corresponding to AD, CN, and unknown. According to prediction probabilities of subjects by the , every examination strategy in the training set and validation set is labeled by the Algorithm S1. The second stage is training the , the input of the contains raw data and the prediction probability, the dimension of the output of the

is 12, which respectively correspond to 12 categories of examination. The model was optimized using mini-batch stochastic gradient descent with Adam and a base learning rate of 0.0005 

[68]. The experiments are conducted on a Linux server equipped with Tesla P40 and Tesla P100 GPU.

Due to the historical information has a significant influence on the diagnosis of Alzheimer’s disease, there is a vast difference between the diagnosis of Alzheimer’s disease at first visit without historical information and other visits with historical data. Therefore, based on the above model training method, we additionally trained a model for diagnosing Alzheimer’s disease at the first visit based on the subject’s data at the first visit.

Prediction

Unlike the other state-of-the-art AI models, predictions of our model are dynamic. The prediction algorithm comprehensively considers the situation of the subject, the condition of the medical institution, and the ability of our model to dynamically adjust the diagnosis strategy ( Algorithm S4). Firstly, our model will generate the probability for every category (AD vs. CN vs. Unknown) according to the current input data of the subject. Second, if the probability of categories exceeds thresholds (AD , CN , unknown ), output the corresponding label. Otherwise, adjust the examination strategy by selecting the subsequent examination according to the situation of the subject and the medical institution, and go to the first step. Finally, if all diagnostic strategies are tried, the model still cannot obtain the probability of exceeding the threshold and then outputs unknown.

Statistical analysis

To evaluate the evaluation index of the AI model, a non-parametric bootstrap method is applied to calculate the confidence intervals (CI) for the evaluation index [69]. In this work, we calculate 95% CI for every evaluation index. We randomly sample cases from the test set and evaluated the AI model by the sampled set for every evaluation index. repeated trials are executed, and values of the evaluation index are generated. The 95% CI is obtained by the 2.5 and 97.5 percentiles of the distribution of the evaluation index values.

Fig. S 1: The open, dynamic machine learning framework of OpenClinicalAI. The OpenClinicalAI framework contains four independent modules and one accessory module. processes the input data for the , and processes the input data for the . The introduces the OpenMax mechanism to identify unknown categories of subjects. The accessory module is used to help the retain features of the sample and improve the ability to identify unknown categories of subjects. The is used to select the examination to be carried out in the next step. In addition, the length of input data of the OpenClinicalAI framework is variable to adapt to data of different subjects at different visits.
Fig. S 2: The real-world setting of Clinical AIBench. Subjects in real-world settings are different with various situations. They contain different pre-known categories and unknown and unfamiliar categories for the specific clinician or AI diagnostic system. The visit of subjects to a particular medical institution can not be pre-specified and hence are uncertain. Medical institutions in real-world settings also are different with different executive abilities of the examination. The executive ability of the examination in various medical institutions is very different from small-scale country clinics to large-scale hospitals. In addition, it is difficult to know by advance all the specific medical institutions that will deploy the AI system and their particular executive abilities of the examination.
Fig. S 3: Data framework for single subject. Our data representation framework comprehensively considers the historical visit information and current visit information of the subject. The data with the earlier time is farther away from the current data.
Data set Training set Validation set Test set
Age 54-59.9 80 36 2 59
60-69.9 596 246 10 442
70-70.9 1048 528 46 695
80-80.9 395 213 14 259
90-91.9 6 1 1 4
Gender Female 1130 560 44 785
Male 997 465 29 675
Educate 4-7 11 4 0 8
8-10 40 18 2 23
11-13 353 176 13 243
14-16 823 403 26 558
17-20 900 424 32 628
Ethnic
category
Hisp/Latino 73 32 5 49
Not Hisp/Latino 2042 986 67 1404
Unknown 12 7 1 7
Racial
category
Asian 40 20 0 25
Black 88 41 5 57
Hawaiian/Other PI 2 0 0 2
More than one 25 10 0 18
White 1964 954 68 1350
Am Indian/Alaskan 4 0 0 4
Unknown 4 0 0 4
Marriage Married 1618 805 59 1100
Never_married 73 30 3 48
Widowed 238 114 8 165
Divorced 191 75 3 141
Unknown 7 1 0 6
Category AD 740 587 44 109
CN 589 466 31 92
MCI 1082 0 0 1082
SMC 280 0 0 280
Table S 1: Characteristics of subjects.
Visit Data set Training set Validation set Test set
first visit 2126 705 53 1368
m06 1515 604 45 866
m12 1475 621 43 811
m18 329 79 5 245
m24 1217 591 32 594
m36 804 338 23 443
m48 638 311 14 313
m60 399 178 13 208
m72 395 207 14 174
m84 268 128 6 134
m96 146 74 3 69
m108 100 55 1 44
m120 75 39 1 35
m132 55 33 0 22
m144 39 19 1 19
m156 12 4 0 8
Table S 2: The visit distribution of subjects.
Diagnosis strategies Visit number of subject
Base Cog CE Neur FB PE Blood Urine MRI FDG AV45 Gene CSF AD CN Unknown Total
1 1 1 0 0 0 0 0 0 0 0 0 0 0 244 280 2680 3204
2 1 1 1 1 1 1 0 0 1 0 0 0 0 16 23 697 736
3 1 1 1 1 1 1 0 0 1 0 1 0 0 3 10 81 94
4 1 1 1 1 1 1 0 0 1 1 1 0 0 6 8 124 138
5 1 1 1 0 0 0 0 0 0 0 0 0 0 8 47 43 98
6 1 1 1 1 1 1 0 0 0 0 0 0 0 10 6 183 199
7 1 1 1 1 0 0 0 0 0 0 0 0 0 1 2 27 30
8 1 1 1 1 1 1 0 0 0 1 0 0 0 1 2 11 14
9 1 1 1 1 1 1 0 0 0 0 1 0 0 1 1 16 18
10 1 1 1 1 1 1 0 0 1 1 0 0 0 2 6 232 240
11 1 1 1 1 1 1 0 0 0 1 1 0 0 0 3 16 19
12 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1
13 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 33 34
14 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 2 2
15 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 1 1
16 1 1 1 1 1 1 1 0 1 1 1 1 1 2 0 113 115
17 1 1 1 1 1 1 1 1 0 0 0 0 0 5 14 10 29
18 1 1 1 1 1 1 0 0 1 0 0 1 0 0 3 21 24
19 1 1 1 1 1 1 1 0 1 1 1 1 0 3 0 13 16
20 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 42 43
21 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 2 3
22 1 1 1 1 1 1 0 0 1 0 1 1 0 0 5 21 26
23 1 1 1 1 1 1 0 0 1 1 0 1 0 1 0 9 10
24 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 54 54
25 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 60 60
26 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 67 67
27 1 1 1 1 1 1 1 0 1 1 0 1 0 0 0 55 55
28 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 1 1
29 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 3 3
30 1 1 1 1 1 1 1 0 1 1 0 1 1 0 0 2 2
31 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 13 13
32 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1
33 1 1 1 1 1 1 1 0 1 0 0 1 1 0 0 1 1
34 1 1 1 1 1 1 0 0 1 0 0 1 1 0 0 1 1
35 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1
Table S 3: The diagnosis strategies for the test set.
Medical institution
without examination capabilities
Visit number of subject in the
condition of medical institution
Base Cog CE Neur FB PE Blood Urine MRI FDG AV45 Gene CSF AD CN Unknown Total
1 0 0 0 0 0 0 1 1 0 0 0 0 0 23 35 952 1010
2 0 0 0 0 0 0 1 1 0 1 0 1 1 0 2 11 13
3 0 0 0 0 0 0 1 1 0 1 0 0 0 3 8 63 74
4 0 0 0 0 0 0 1 1 1 0 0 0 0 1 4 23 28
5 0 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 2
6 0 0 0 0 0 0 1 1 0 1 1 1 1 0 2 39 41
7 0 0 0 0 0 0 1 1 1 0 0 1 1 0 1 4 5
8 0 0 0 0 0 0 0 0 0 0 1 0 0 3 0 19 22
9 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 3 4
10 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 11 13
11 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 13 14
12 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 26 26
13 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1
14 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 69 69
15 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1
16 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 19 19
17 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 6 7
18 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1
19 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 2 2
20 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 5 5
21 0 0 0 0 0 0 0 1 0 0 0 0 0 12 45 85 142
22 0 0 0 0 0 0 1 1 0 1 1 0 1 0 3 21 24
23 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
24 0 0 0 0 0 0 1 1 0 1 0 0 1 0 4 10 14
25 0 0 0 0 0 0 1 1 0 0 1 0 1 1 0 9 10
26 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 60 60
27 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 54 54
28 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1
29 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 3 3
30 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 5 5
31 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 9 9
32 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1
33 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1
34 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 2 2
35 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 5 5
36 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1
37 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1
38 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1
39 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1
40 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 1
  • The examination is marked as 1, meaning that the medical institution cannot perform this examination for the subject. The examination is marked as 0, indicating that (1) the medical institution can perform this examination for the subject, or (2) OpenClinicalAI does not request for performing this examination during the diagnosis of the subject though the medical institution may not be able to perform this examination for the subject. It is worth noting that the examination ability in the test set may be different from other AI systems since 0 may mean that OpenClinicalAI does not request for performing this examination during the diagnosis of the subject. However, the medical institution may not be able to perform this examination for the subject.

Table S 4: Medical institutions with different examination abilities in the test set.
SNP_NAME SNP_NAME SNP_NAME SNP_NAME
rs429358 rs7412 rs10948363 rs7274581
rs17125944 rs4147929 rs6656401 rs11771145
rs6733839 rs983392 rs10498633 rs28834970
rs9271192 rs35349669 rs9331896 rs1476679
rs10792832 rs2718058 rs190982 rs10838725
rs11218343 rs4844610 rs10933431 rs9271058
rs75932628 rs9473117 rs12539172 rs10808026
rs73223431 rs3740688 rs7933202 rs3851179
rs17125924 rs12881735 rs3752246 rs6024870
rs7920721 rs138190086 rs4723711 rs4266886
rs61822977 rs6733839 rs10202748 rs115124923
rs115675626 rs1109581 rs17265593 rs2597283
rs1476679 rs78571833 rs12679874 rs2741342
rs7831810 rs1532277 rs9331888 rs7920721
rs3740688 rs7116190 rs526904 rs543293
rs11218343 rs6572869 rs12590273 rs7145100
rs74615166 rs2526378 rs117481827 rs7408475
rs3752246 rs7274581
Table S 5: SNPs relate to AD.
AD_Normal CN_Normal
Low High Low High
Medical
history
Psychiatric 0 0 0 0
Neurologic
(other than AD)
0 0 0 0
Symptoms1 Present_count_212 0 6 0 6
Present_count_283 0 8 0 8
Cognitive
Change Index4
Score_125 32.2188 60 12 13.5634
Score_206 50.3438 100 20 22.0845
CDRSB7 2 18 0 0
Alzheimer’s Disease
Assessment Scale8
ADAS119 10 70 0 11.264
ADAS1310 18 85 0 17.67
ADASQ4 5 10 0 6
MMSE11 0 27 25 30
MOCA12 0 23 26 30
Preclinical Alzheimer’s
Cognitive Composite13
mPACCdigit -30.0745 -7.6955 -5.1733 4.7304
mPACCtrailsB -29.7277 -6.7798 -4.8523 4.3338
Table S 6: The normal range of indicators.
0:  The label set , the prediction set , diagnosis strategy set for a subject in a visit.
0:  Next examination set
1:  Sort the by the number of examinations in a diagnosis strategy.
2:  for exam_strategy do
3:     for exam_strategy do
4:        if [i][j] then
5:           
6:           if  then
7:              The next examination of current examination strategy [i] is label by [j].
8:           end if
9:        end if
10:     end for
11:  end for
Algorithm S 1 The examination label algorithm.
0:  The abnormal pattern dataset , the FitHigh function from libMR [63], the MiniBatchKMeans function from scikit-learn [70], the number of the center of known categories of subject

, quantiles

.
0:  The centers of known categories of subject , and libMR models , the threshold of known categories of subject .
1:   is the abnormal pattern dataset of known categories of subject, in which every data is belong to known categories of subject and is correctly classified by the trained AI model. is the number of the known categories of subject.
2:  for  do
3:     
4:  end for
5:  
6:  for  do
7:     for  do
8:        .add(distance(x,,)  //
9:     end for
10:  end for
11:  for  do
12:     =FitHigh()
13:      is the quantile of the
14:  end for
15:  Return , ,
Algorithm S 2 The modified OpenMax algorithm.
0:  Abnormal pattern of the subject , raw data of subject , activation vector , The centers of known categories of subject , and libMR models , the threshold of known categories of subject , flag , the numer of “top” classes to revise .
0:  The prediction probability .
1:   is the number of the known categories of subject.
2:  Let
3:  Let
4:  for  do
5:     
6:  end for
7:  for  do
8:     dist
9:  end for
10:  Revise activation vector
11:  Define
12:  
13:  if  then
14:     
15:     for  do
16:        
17:        if  then
18:           .append(0)
19:        else
20:           
21:           if  then
22:              
23:           end if
24:           .append()
25:        end if
26:     end for
27:     for  do
28:        
29:     end for
30:     
31:  end if
32:  Return
Algorithm S 3 OpenMax probability estimation.
0:  The base information and history recodes for a subject in a visit, the trained model . The threshold , and .
0:  The label of the subject.
1:  = concatenates
2:  while True do
3:     , =
4:     for  do
5:        if  then
6:           Return i   // When , the result is representing unknown
7:        end if
8:     end for
9:     
10:     for  do
11:        if  then
12:           if The examination is able to execute by medical institution then
13:              = concat
14:              
15:           end if
16:        end if
17:     end for
18:     if not  then
19:        Select a less cost and common examination examination which do not execute in this visit and is able to execute by medical institution.
20:        if  examination is selected then
21:           = concat
22:           
23:        end if
24:     end if
25:     if not  then
26:        Return unknown
27:     end if
28:  end while
Algorithm S 4 The prediction algorithm.

References

  • [48] R. C. Petersen, et al., Neurology 74, 201 (2010).
  • [49] M. W. Weiner, et al., Alzheimer’s & Dementia 6, 202 (2010).
  • [50] M. W. Weiner, et al., Alzheimer’s & Dementia 11, 865 (2015).
  • [51] M. W. Weiner, et al., Alzheimer’s & Dementia 13, 561 (2017).
  • [52] S. Darkner, Fdg-pet template mni152 1mm (2013).
  • [53] S. M. Smith, et al., Neuroimage 23, S208 (2004).
  • [54] N. Seneca, C. Burger, I. Florea pp. 1–3 (2011).
  • [55] J.-C. Lambert, et al., Nature genetics 45, 1452 (2013).
  • [56] B. W. Kunkle, et al., Nature genetics 51, 414 (2019).
  • [57] R. S. Desikan, et al., PLoS medicine 14, e1002258 (2017).
  • [58] P. Perera, et al., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 11814–11823.
  • [59] A. Kendall, Y. Gal, R. Cipolla, Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7482–7491.
  • [60] Y. Zhang, Q. V. Liao, R. K. Bellamy, Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (2020), pp. 295–305.
  • [61] P. Linardatos, V. Papastefanopoulos, S. Kotsiantis, Entropy 23, 18 (2021).
  • [62] Z. Ge, S. Demyanov, Z. Chen, R. Garnavi, British Machine Vision Conference 2017 (British Machine Vision Association and Society for Pattern Recognition, 2017).
  • [63] W. J. Scheirer, A. Rocha, R. Michaels, T. E. Boult, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33, 1689 (2011).
  • [64] C. R. Jack Jr, et al., Alzheimer’s & dementia 7, 257 (2011).
  • [65] R. A. Sperling, et al., Alzheimer’s & dementia 7, 280 (2011).
  • [66] M. S. Albert, et al., Alzheimer’s & dementia 7, 270 (2011).
  • [67] M. C. Donohue, et al., JAMA neurology 71, 961 (2014).
  • [68] D. P. Kingma, J. Ba, ICLR (Poster) (2015).
  • [69] B. Efron, R. J. Tibshirani, An introduction to the bootstrap (CRC press, 1994).
  • [70] D. Sculley, Proceedings of the 19th international conference on World wide web (2010), pp. 1177–1178.