SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data for Cancer Type Classification

02/03/2022
by   Sayed Hashim, et al.
0

For personalized medicines, very crucial intrinsic information is present in high dimensional omics data which is difficult to capture due to the large number of molecular features and small number of available samples. Different types of omics data show various aspects of samples. Integration and analysis of multi-omics data give us a broad view of tumours, which can improve clinical decision making. Omics data, mainly DNA methylation and gene expression profiles are usually high dimensional data with a lot of molecular features. In recent years, variational autoencoders (VAE) have been extensively used in embedding image and text data into lower dimensional latent spaces. In our project, we extend the idea of using a VAE model for low dimensional latent space extraction with the self-supervised learning technique of feature subsetting. With VAEs, the key idea is to make the model learn meaningful representations from different types of omics data, which could then be used for downstream tasks such as cancer type classification. The main goals are to overcome the curse of dimensionality and integrate methylation and expression data to combine information about different aspects of same tissue samples, and hopefully extract biologically relevant features. Our extension involves training encoder and decoder to reconstruct the data from just a subset of it. By doing this, we force the model to encode most important information in the latent representation. We also added an identity to the subsets so that the model knows which subset is being fed into it during training and testing. We experimented with our approach and found that SubOmiEmbed produces comparable results to the baseline OmiEmbed with a much smaller network and by using just a subset of the data. This work can be improved to integrate mutation-based genomic data as well.

READ FULL TEXT
research
08/17/2019

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

Different aspects of a clinical sample can be revealed by multiple types...
research
02/03/2021

OmiEmbed: reconstruct comprehensive phenotypic information from multi-omics data using multi-task deep learning

High-dimensional omics data contains intrinsic biomedical information th...
research
11/20/2019

Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Analysis of somatic mutation profiles from cancer patients is essential ...
research
01/13/2022

Reproducible, incremental representation learning with Rosetta VAE

Variational autoencoders are among the most popular methods for distilli...
research
06/17/2022

Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE

The human perception system is often assumed to recruit motor knowledge ...
research
08/07/2018

Inferring Molecular Pathology and micro-RNA Transcriptome from mRNA Profiles of Cancer Biopsies through Deep Multi-Task Learning

Despite great advances, molecular cancer pathology is often limited to u...
research
06/18/2019

Learning data representation using modified autoencoder for the integrative analysis of multi-omics data

In integrative analyses of omics data, it is often of interest to extrac...

Please sign up or login with your details

Forgot password? Click here to reset