Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

12/03/2021
by   Vithya Yogarajan, et al.
0

Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt

Automatic International Classification of Diseases (ICD) coding aims to ...
research
01/24/2021

Does Head Label Help for Long-Tailed Multi-Label Text Classification

Multi-label text classification (MLTC) aims to annotate documents with t...
research
05/18/2020

Interaction Matching for Long-Tail Multi-Label Classification

We present an elegant and effective approach for addressing limitations ...
research
10/01/2021

Predicting COVID-19 Patient Shielding: A Comprehensive Study

There are many ways machine learning and big data analytics are used in ...
research
04/06/2023

Automatic ICD-10 Code Association: A Challenging Task on French Clinical Texts

Automatically associating ICD codes with electronic health data is a wel...
research
11/19/2022

Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text Classification

Multi-label text classification (MLTC) is one of the key tasks in natura...
research
07/27/2023

ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection

Fanfiction, a popular form of creative writing set within established fi...

Please sign up or login with your details

Forgot password? Click here to reset