An Extensive Data Processing Pipeline for MIMIC-IV

04/29/2022
by   Mehak Gupta, et al.
0

An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical tasks. This growing area of research has exposed the limitation of accessibility of EHR datasets for all, as well as the reproducibility of different modeling frameworks. One reason for these limitations is the lack of standardized pre-processing pipelines. MIMIC is a freely available EHR dataset in a raw format that has been used in numerous studies. The absence of standardized pre-processing steps serves as a major barrier to the wider adoption of the dataset. It also leads to different cohorts being used in downstream tasks, limiting the ability to compare the results among similar studies. Contrasting studies also use various distinct performance metrics, which can greatly reduce the ability to compare model results. In this work, we provide an end-to-end fully customizable pipeline to extract, clean, and pre-process data; and to predict and evaluate the fourth version of the MIMIC dataset (MIMIC-IV) for ICU and non-ICU-related clinical time-series prediction tasks.

READ FULL TEXT
research
07/19/2019

MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III

Robust machine learning relies on access to data that can be used with s...
research
11/16/2021

HiRID-ICU-Benchmark – A Comprehensive Machine Learning Benchmark on High-resolution ICU Data

The recent success of machine learning methods applied to time series co...
research
03/27/2023

Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record

We propose an approach for adapting the DeBERTa model for electronic hea...
research
01/26/2018

Methodological variations in lagged regression for detecting physiologic drug effects in EHR data

We studied how lagged linear regression can be used to detect the physio...
research
04/27/2023

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

Clinical notes are assigned ICD codes - sets of codes for diagnoses and ...
research
02/26/2023

A Multidatabase ExTRaction PipEline (METRE) for Facile Cross Validation in Critical Care Research

Transforming raw EHR data into machine learning model-ready inputs requi...
research
08/13/2020

A Comprehensive Pipeline for Hotel Recommendation System

This paper addresses a comprehensive pipeline to build a hotel recommend...

Please sign up or login with your details

Forgot password? Click here to reset