SGC: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks

09/21/2022
by   Niloofar Aghaieabiane, et al.
0

A widely used approach for extracting information from gene expression data employ the construction of a gene co-expression network and the subsequent application of algorithms that discover network structure. In particular, a common goal is the computational discovery of gene clusters, commonly called modules. When applied on a novel gene expression dataset, the quality of the computed modules can be evaluated automatically, using Gene Ontology enrichment, a method that measures the frequencies of Gene Ontology terms in the computed modules and evaluates their statistical likelihood. In this work we propose SGC a novel pipeline for gene clustering based on relatively recent seminal work in the mathematics of spectral network theory. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules. Comparing with already well-known existing frameworks, we show that SGC results in higher enrichment in real data. In particular, in 12 real gene expression datasets, SGC outperforms in all except one.

READ FULL TEXT

page 4

page 6

page 7

page 8

research
10/20/2019

Identification of Interaction Clusters Using a Semi-supervised Hierarchical Clustering Method

Motivation: Identifying interaction clusters of large gene regulatory ne...
research
06/03/2019

Sea of Genes: Combining Animation and Narrative Strategies to Visualize Metagenomic Data for Museums

We examine the application of narrative strategies to present a complex ...
research
05/09/2012

Using the Gene Ontology Hierarchy when Predicting Gene Function

The problem of multilabel classification when the labels are related thr...
research
01/04/2023

l_1-2 GLasso: L_1-2 Regularized Multi-task Graphical Lasso for Joint Estimation of eQTL Mapping and Gene Network

A critical problem in genetics is to discover how gene expression is reg...
research
10/30/2022

ISG: I can See Your Gene Expression

This paper aims to predict gene expression from a histology slide image ...
research
11/09/2012

LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data

LAGE is a systematic framework developed in Java. The motivation of LAGE...
research
04/03/2018

A Mixture Model to Detect Edges in Sparse Co-expression Graphs

In the early days of microarray data, the medical and statistical commun...

Please sign up or login with your details

Forgot password? Click here to reset