A Multidatabase ExTRaction PipEline (METRE) for Facile Cross Validation in Critical Care Research

02/26/2023
by   Wei Liao, et al.
0

Transforming raw EHR data into machine learning model-ready inputs requires considerable effort. One widely used EHR database is Medical Information Mart for Intensive Care (MIMIC). Prior work on MIMIC-III cannot query the updated and improved MIMIC-IV version. Besides, the need to use multicenter datasets further highlights the challenge of EHR data extraction. Therefore, we developed an extraction pipeline that works on both MIMIC-IV and eICU Collaborative Research Database and allows for model cross validation using these 2 databases. Under the default choices, the pipeline extracted 38766 and 126448 ICU records for MIMIC-IV and eICU, respectively. Using the extracted time-dependent variables, we compared the Area Under the Curve (AUC) performance with prior works on clinically relevant tasks such as in-hospital mortality prediction. METRE achieved comparable performance with AUC 0.723- 0.888 across all tasks. Additionally, when we evaluated the model directly on MIMIC-IV data using a model trained on eICU, we observed that the AUC change can be as small as +0.019 or -0.015. Our open-source pipeline transforms MIMIC-IV and eICU into structured data frames and allows researchers to perform model training and testing using data collected from different institutions, which is of critical importance for model deployment under clinical contexts.

READ FULL TEXT

page 5

page 8

page 13

research
10/17/2021

Real-time Mortality Prediction Using MIMIC-IV ICU Data Via Boosted Nonparametric Hazards

Electronic Health Record (EHR) systems provide critical, rich and valuab...
research
07/19/2019

MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III

Robust machine learning relies on access to data that can be used with s...
research
11/30/2018

Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation

Machine learning for healthcare often trains models on de-identified dat...
research
04/29/2022

An Extensive Data Processing Pipeline for MIMIC-IV

An increasing amount of research is being devoted to applying machine le...
research
12/06/2018

Generalizability of predictive models for intensive care unit patients

A large volume of research has considered the creation of predictive mod...
research
04/27/2023

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

Clinical notes are assigned ICD codes - sets of codes for diagnoses and ...

Please sign up or login with your details

Forgot password? Click here to reset