PLM-ICD: Automatic ICD Coding with Pretrained Language Models

07/12/2022
by   Chao-Wei Huang, et al.
10

Automatically classifying electronic health records (EHRs) into diagnostic codes has been challenging to the NLP community. State-of-the-art methods treated this problem as a multilabel classification problem and proposed various architectures to model this problem. However, these systems did not leverage the superb performance of pretrained language models, which achieved superb performance on natural language understanding tasks. Prior work has shown that pretrained language models underperformed on this task with the regular finetuning scheme. Therefore, this paper aims at analyzing the causes of the underperformance and developing a framework for automatic ICD coding with pretrained language models. We spotted three main issues through the experiments: 1) large label space, 2) long input sequences, and 3) domain mismatch between pretraining and fine-tuning. We propose PLMICD, a framework that tackles the challenges with various strategies. The experimental results show that our proposed framework can overcome the challenges and achieves state-of-the-art performance in terms of multiple metrics on the benchmark MIMIC data. The source code is available at https://github.com/MiuLab/PLM-ICD

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Adapting Pretrained Transformer to Lattices for Spoken Language Understanding

Lattices are compact representations that encode multiple hypotheses, su...
research
01/21/2022

A Comparative Study on Language Models for Task-Oriented Dialogue Systems

The recent development of language models has shown promising results by...
research
09/11/2020

UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction

This work presents our contribution in the context of the 6th task of Se...
research
06/24/2021

Modeling Diagnostic Label Correlation for Automatic ICD Coding

Given the clinical notes written in electronic health records (EHRs), it...
research
05/24/2023

L-CAD: Language-based Colorization with Any-level Descriptions

Language-based colorization produces plausible and visually pleasing col...
research
04/06/2023

Automatic ICD-10 Code Association: A Challenging Task on French Clinical Texts

Automatically associating ICD codes with electronic health data is a wel...
research
11/03/2020

CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Neural rankers based on deep pretrained language models (LMs) have been ...

Please sign up or login with your details

Forgot password? Click here to reset