MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III

07/19/2019
by   Shirly Wang, et al.
0

Robust machine learning relies on access to data that can be used with standardized frameworks in important tasks and the ability to develop models whose performance can be reasonably reproduced. In machine learning for healthcare, the community faces reproducibility challenges due to a lack of publicly accessible data and a lack of standardized data processing frameworks. We present MIMIC-Extract, an open-source pipeline for transforming raw electronic health record (EHR) data for critical care patients contained in the publicly-available MIMIC-III database into dataframes that are directly usable in common machine learning pipelines. MIMIC-Extract addresses three primary challenges in making complex health records data accessible to the broader machine learning community. First, it provides standardized data processing functions, including unit conversion, outlier detection, and aggregating semantically equivalent features, thus accounting for duplication and reducing missingness. Second, it preserves the time series nature of clinical data and can be easily integrated into clinically actionable prediction tasks in machine learning for health. Finally, it is highly extensible so that other researchers with related questions can easily use the same pipeline. We demonstrate the utility of this pipeline by showcasing several benchmark tasks and baseline results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

An Extensive Data Processing Pipeline for MIMIC-IV

An increasing amount of research is being devoted to applying machine le...
research
10/01/2020

Cardea: An Open Automated Machine Learning Framework for Electronic Health Records

An estimated 180 papers focusing on deep learning and EHR were published...
research
02/26/2023

A Multidatabase ExTRaction PipEline (METRE) for Facile Cross Validation in Critical Care Research

Transforming raw EHR data into machine learning model-ready inputs requi...
research
11/16/2021

HiRID-ICU-Benchmark – A Comprehensive Machine Learning Benchmark on High-resolution ICU Data

The recent success of machine learning methods applied to time series co...
research
03/08/2023

Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test

Machine learning and deep learning have been used extensively to classif...
research
08/02/2019

Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks

When training clinical prediction models from electronic health records ...
research
07/05/2023

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

While the general machine learning (ML) community has benefited from pub...

Please sign up or login with your details

Forgot password? Click here to reset