Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data

10/22/2018
by   Ehsan Hajiramezanali, et al.
0

Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS). However, translating NGS data faces several challenges. First, NGS count data are often overdispersed, requiring appropriate modeling. Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity. The key question is whether we may integrate available data from all different sources or domains to achieve reproducible disease prognosis based on NGS count data. In this paper, we develop a Bayesian Multi-Domain Learning (BMDL) model that derives domain-dependent latent representations of overdispersed count data based on hierarchical negative binomial factorization for accurate cancer subtyping even if the number of samples for a specific cancer type is small. Experimental results from both our simulated and NGS datasets from The Cancer Genome Atlas (TCGA) demonstrate the promising potential of BMDL for effective multi-domain learning without "negative transfer" effects often seen in existing multi-task learning and transfer learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2018

Inferring Molecular Pathology and micro-RNA Transcriptome from mRNA Profiles of Cancer Biopsies through Deep Multi-Task Learning

Despite great advances, molecular cancer pathology is often limited to u...
research
05/08/2020

The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model Learning with Application to Genomic Data Integration

Recent advances in biological research have seen the emergence of high-t...
research
08/15/2017

Sparse Inverse Covariance Estimation for High-throughput microRNA Sequencing Data in the Poisson Log-Normal Graphical Model

We introduce the Poisson Log-Normal Graphical Model for count data, and ...
research
09/12/2020

Machine Learning Against Cancer: Accurate Diagnosis of Cancer by Machine Learning Classification of the Whole Genome Sequencing Data

Machine learning can precisely identify different cancer tumors at any s...
research
05/01/2023

Cancer-inspired Genomics Mapper Model for the Generation of Synthetic DNA Sequences with Desired Genomics Signatures

Genome data are crucial in modern medicine, offering significant potenti...
research
10/12/2020

BayReL: Bayesian Relational Learning for Multi-omics Data Integration

High-throughput molecular profiling technologies have produced high-dime...

Please sign up or login with your details

Forgot password? Click here to reset