Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

04/27/2023
by   Thanh-Tung Nguyen, et al.
0

Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Benchmarking Predictive Risk Models for Emergency Departments with Large Public Electronic Health Records

There is a continuously growing demand for emergency department (ED) ser...
research
04/21/2023

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Medical coding is the task of assigning medical codes to clinical free-t...
research
06/12/2020

Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Clinical coding is currently a labour-intensive, error-prone, but critic...
research
04/29/2022

An Extensive Data Processing Pipeline for MIMIC-IV

An increasing amount of research is being devoted to applying machine le...
research
12/12/2022

Automated ICD Coding using Extreme Multi-label Long Text Transformer-based Models

Background: Encouraged by the success of pretrained Transformer models i...
research
02/26/2023

A Multidatabase ExTRaction PipEline (METRE) for Facile Cross Validation in Critical Care Research

Transforming raw EHR data into machine learning model-ready inputs requi...
research
01/27/2022

Consolidated learning – a domain-specific model-free optimization strategy with examples for XGBoost and MIMIC-IV

For many machine learning models, a choice of hyperparameters is a cruci...

Please sign up or login with your details

Forgot password? Click here to reset