Coupling Deep Imputation with Multitask Learning for Downstream Tasks on Genomics Data

04/28/2022
by   Sophie Peacock, et al.
0

Genomics data such as RNA gene expression, methylation and micro RNA expression are valuable sources of information for various clinical predictive tasks. For example, predicting survival outcomes, cancer histology type and other patients' related information is possible using not only clinical data but molecular data as well. Moreover, using these data sources together, for example in multitask learning, can boost the performance. However, in practice, there are many missing data points which leads to significantly lower patient numbers when analysing full cases, which in our setting refers to all modalities being present. In this paper we investigate how imputing data with missing values using deep learning coupled with multitask learning can help to reach state-of-the-art performance results using combined genomics modalities, RNA, micro RNA and methylation. We propose a generalised deep imputation method to impute values where a patient has all modalities present except one. Interestingly enough, deep imputation alone outperforms multitask learning alone for the classification and regression tasks across most combinations of modalities. In contrast, when using all modalities for survival prediction we observe that multitask learning alone outperforms deep imputation alone with statistical significance (adjusted p-value 0.03). Thus, both approaches are complementary when optimising performance for downstream predictive tasks.

READ FULL TEXT

page 1

page 4

research
10/30/2020

Handling Missing Data with Graph Representation Learning

Machine learning with missing data has been approached in two different ...
research
11/16/2020

Imputation techniques on missing values in breast cancer treatment and fertility data

Clinical decision support using data mining techniques offers more intel...
research
03/23/2023

Une comparaison des algorithmes d'apprentissage pour la survie avec données manquantes

Survival analysis is an essential tool for the study of health data. An ...
research
11/17/2016

A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival

Cancer survival prediction is an active area of research that can help p...
research
07/21/2023

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

One of the most challenging fields where Artificial Intelligence (AI) ca...
research
01/28/2022

From data to functa: Your data point is a function and you should treat it like one

It is common practice in deep learning to represent a measurement of the...

Please sign up or login with your details

Forgot password? Click here to reset