Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining

09/13/2023
by   Xianghao Zhan, et al.
0

Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4 enhancements in all 48 COVID-19 experiments (up to 23.8 accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6 substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.

READ FULL TEXT

page 1

page 6

research
12/04/2018

A Deep Learning Framework for Semi-Supervised Cross-Modal Retrieval with Label Prediction

Due to abundance of data from multiple modalities, cross-modal retrieval...
research
01/11/2020

Bayesian Semi-supervised learning under nonparanormality

Semi-supervised learning is a classification method which makes use of b...
research
04/08/2021

GKD: Semi-supervised Graph Knowledge Distillation for Graph-Independent Inference

The increased amount of multi-modal medical data has opened the opportun...
research
10/07/2016

Temporal Ensembling for Semi-Supervised Learning

In this paper, we present a simple and efficient method for training dee...
research
07/27/2020

Semi-Supervised Learning with Data Augmentation for End-to-End ASR

In this paper, we apply Semi-Supervised Learning (SSL) along with Data A...
research
05/27/2019

Label Prediction Framework for Semi-Supervised Cross-Modal Retrieval

Cross-modal data matching refers to retrieval of data from one modality,...

Please sign up or login with your details

Forgot password? Click here to reset