Mining Functionally Related Genes with Semi-Supervised Learning

11/05/2020
by   Kaiyu Shen, et al.
0

The study of biological processes can greatly benefit from tools that automatically predict gene functions or directly cluster genes based on shared functionality. Existing data mining methods predict protein functionality by exploiting data obtained from high-throughput experiments or meta-scale information from public databases. Most existing prediction tools are targeted at predicting protein functions that are described in the gene ontology (GO). However, in many cases biologists wish to discover functionally related genes for which GO terms are inadequate. In this paper, we introduce a rich set of features and use them in conjunction with semisupervised learning approaches in order to expand an initial set of seed genes to a larger cluster of functionally related genes. Among all the semi-supervised methods that were evaluated, the framework of learning with positive and unlabeled examples (LPU) is shown to be especially appropriate for mining functionally related genes. When evaluated on experimentally validated benchmark data, the LPU approaches1 significantly outperform a standard supervised learning algorithm as well as an established state-of-the-art method. Given an initial set of seed genes, our best performing approach could be used to mine functionally related genes in a wide range of organisms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2012

Using the Gene Ontology Hierarchy when Predicting Gene Function

The problem of multilabel classification when the labels are related thr...
research
06/22/2021

Recent Deep Semi-supervised Learning Approaches and Related Works

The author of this work proposes an overview of the recent semi-supervis...
research
05/16/2020

Machine Learning for Exploring Spatial Affordance Patterns

This dissertation uses supervised and unsupervised data mining technique...
research
12/03/2012

Hypergraph and protein function prediction with gene expression data

Most network-based protein (or gene) function prediction methods are bas...
research
11/19/2012

Application of three graph Laplacian based semi-supervised learning methods to protein function prediction problem

Protein function prediction is the important problem in modern biology. ...
research
08/11/2016

Semi-Supervised Prediction of Gene Regulatory Networks Using Machine Learning Algorithms

Use of computational methods to predict gene regulatory networks (GRNs) ...
research
05/07/2020

Improving supervised prediction of aging-related genes via dynamic network analysis

This study focuses on supervised prediction of aging-related genes from ...

Please sign up or login with your details

Forgot password? Click here to reset