A Mixture Model to Detect Edges in Sparse Co-expression Graphs

04/03/2018
by   Haim Bar, et al.
0

In the early days of microarray data, the medical and statistical communities focused on gene-level data, and particularly on finding differentially expressed genes. This usually involved making a simplifying assumption that genes are independent, which made likelihood derivations feasible and allowed for relatively simple implementations. However, this is not a realistic assumption, and in recent years the scope has expanded, and has come to include pathway and 'gene set' analysis in an attempt to understand the relationships between genes. In this paper we develop a method to recover a gene network's structure from co-expression data, which we measure in terms of normalized Pearson's correlation coefficients between gene pairs. We treat these co-expression measurements as weights in the complete graph in which nodes correspond to genes. We assume that the network is sparse and that only a small fraction of the putative edges are included (`non-null' edges). To decide which edges exist in the gene network, we fit three-component mixture model such that the observed weights of `null edges' follow a normal distribution with mean 0, and the non-null edges follow a mixture of two log-normal distributions, one for positively- and one for negatively-correlated pairs. We show that this so-called L_2N mixture model outperforms other methods in terms of power to detect edges. We also show that using the L_2N model allows for the control of the false discovery rate. Importantly, the method makes no assumptions about the true network structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2019

A grouped, selectively weighted false discovery rate procedure

False discovery rate (FDR) control in structured hypotheses testing is a...
research
12/05/2019

A sparse negative binomial mixture model for clustering RNA-seq count data

Clustering with variable selection is a challenging but critical task fo...
research
09/21/2022

SGC: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks

A widely used approach for extracting information from gene expression d...
research
03/11/2023

Bayesian Inference of Gene Expression Dynamics in Alzheimer Brains

Alzheimer's disease (AD) is a serious neurodegenerative disease consisti...
research
08/04/2023

Network Inference Using the Hub Model and Variants

Statistical network analysis primarily focuses on inferring the paramete...
research
09/08/2021

Computational methods for differentially expressed gene analysis from RNA-Seq: an overview

The analysis of differential gene expression from RNA-Seq data has becom...
research
07/22/2014

Sequential Changepoint Approach for Online Community Detection

We present new algorithms for detecting the emergence of a community in ...

Please sign up or login with your details

Forgot password? Click here to reset