Logistic Regression Augmented Community Detection for Network Data with Application in Identifying Autism-Related Gene Pathways

09/07/2018
by   Yunpeng Zhao, et al.
0

When searching for gene pathways leading to specific disease outcomes, additional information on gene characteristics is often available that may facilitate to differentiate genes related to the disease from irrelevant background when connections involving both types of genes are observed and their relationships to the disease are unknown. We propose method to single out irrelevant background genes with the help of auxiliary information through a logistic regression, and cluster relevant genes into cohesive groups using the adjacency matrix. Expectation-maximization algorithm is modified to maximize a joint pseudo-likelihood assuming latent indicators for relevance to the disease and latent group memberships as well as Poisson or multinomial distributed link numbers within and between groups. A robust version allowing arbitrary linkage patterns within the background is further derived. Asymptotic consistency of label assignments under the stochastic blockmodel is proven. Superior performance and robustness in finite samples are observed in simulation studies. The proposed robust method identifies previously missed gene sets underlying autism related neurological diseases using diverse data sources including de novo mutations, gene expressions and protein-protein interactions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2023

Parameter-Expanded ECME Algorithms for Logistic and Penalized Logistic Regression

Parameter estimation in logistic regression is a well-studied problem wi...
research
07/12/2019

Towards Probabilistic Generative Models Harnessing Graph Neural Networks for Disease-Gene Prediction

Disease-gene prediction (DGP) refers to the computational challenge of p...
research
09/12/2017

Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure

Genome-wide association studies (GWA studies or GWAS) investigate the re...
research
12/29/2015

Sparse group factor analysis for biclustering of multiple data sources

Motivation: Modelling methods that find structure in data are necessary ...
research
02/18/2011

Inferring Disease and Gene Set Associations with Rank Coherence in Networks

A computational challenge to validate the candidate disease genes identi...
research
10/28/2019

RCRnorm: An integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data

Formalin-fixed paraffin-embedded (FFPE) samples have great potential for...
research
01/11/2023

Optirank: classification for RNA-Seq data with optimal ranking reference genes

Classification algorithms using RNA-Sequencing (RNA-Seq) data as input a...

Please sign up or login with your details

Forgot password? Click here to reset