High-Throughput Machine Learning from Electronic Health Records

07/03/2019
by   Ross S. Kleiman, et al.
0

The widespread digitization of patient data via electronic health records (EHRs) has created an unprecedented opportunity to use machine learning algorithms to better predict disease risk at the patient level. Although predictive models have previously been constructed for a few important diseases, such as breast cancer and myocardial infarction, we currently know very little about how accurately the risk for most diseases or events can be predicted, and how far in advance. Machine learning algorithms use training data rather than preprogrammed rules to make predictions and are well suited for the complex task of disease prediction. Although there are thousands of conditions and illnesses patients can encounter, no prior research simultaneously predicts risks for thousands of diagnosis codes and thereby establishes a comprehensive patient risk profile. Here we show that such pandiagnostic prediction is possible with a high level of performance across diagnosis codes. For the tasks of predicting diagnosis risks both 1 and 6 months in advance, we achieve average areas under the receiver operating characteristic curve (AUCs) of 0.803 and 0.758, respectively, across thousands of prediction tasks. Finally, our research contributes a new clinical prediction dataset in which researchers can explore how well a diagnosis can be predicted and what health factors are most useful for prediction. For the first time, we can get a much more complete picture of how well risks for thousands of different diagnosis codes can be predicted.

READ FULL TEXT

page 13

page 18

research
04/25/2019

Predicting Stroke from Electronic Health Records

Studies have identified various risk factors associated with the onset o...
research
06/24/2020

Diagnosis Prevalence vs. Efficacy in Machine-learning Based Diagnostic Decision Support

Many recent studies use machine learning to predict a small number of IC...
research
09/06/2016

A Bootstrap Machine Learning Approach to Identify Rare Disease Patients from Electronic Health Records

Rare diseases are very difficult to identify among large number of other...
research
11/19/2019

Examining the impact of data quality and completeness of electronic health records on predictions of patients risks of cardiovascular disease

The objective is to assess the extent of variation of data quality and c...
research
06/18/2022

Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic Health Records Data

Statistical learning with a large number of rare binary features is comm...
research
10/16/2020

Interpretable Disease Prediction based on Reinforcement Path Reasoning over Knowledge Graphs

Objective: To combine medical knowledge and medical data to interpretabl...
research
03/22/2023

ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes Progressions

In this study, we introduce ExBEHRT, an extended version of BEHRT (BERT ...

Please sign up or login with your details

Forgot password? Click here to reset