Log In Sign Up

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

by   Hung-I Harry Chen, et al.

Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists' capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets. In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets' ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets. Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.


page 1

page 5

page 8

page 9

page 11

page 23

page 36

page 37


Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency

Gene expression profiles have been widely used to characterize patterns ...

Learning data representation using modified autoencoder for the integrative analysis of multi-omics data

In integrative analyses of omics data, it is often of interest to extrac...

An Enhanced MA Plot with R-Shiny to Ease Exploratory Analysis of Transcriptomic Data

MA plots are used to analyze the genome-wide differences in gene express...

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

Gene Ontology (GO) is the primary gene function knowledge base that enab...

Redundancy-aware unsupervised ranking based on game theory – application to gene enrichment analysis

Gene set collections are a common ground to study the enrichment of gene...