Combining Cost-Sensitive Classification with Negative Selection for Protein Function Prediction

05/18/2018
by   Marco Frasca, et al.
0

Motivation: Computational methods play a central role in annotating the functions of large amounts of proteins delivered by high-throughput technologies. Despite the encouraging results achieved by these methods, many functions still have a very low number of verified protein annotations, leading to a pronounced imbalance between annotated and unannotated proteins. Furthermore, functional taxonomies rarely report negative annotations. This leaves ill defined the set of negative examples, which is crucial for training the majority of machine learning methods. In practice, neglecting data imbalance and the problem of selecing negative examples can strongly limit the accuracy of protein function prediction. Results: We present a novel approach combining a suitable imbalance-aware classification strategy, addressing the scarcity of annotated proteins, with an active learning strategy for selecting the most reliable negative examples. When implemented in a Support Vector Machine, this combined approach shows improved accuracy on yeast and human proteomes over standard SVM and top-performing function prediction tools

READ FULL TEXT
research
05/23/2018

Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples

Public repositories for genome and proteome annotations, such as the Gen...
research
10/10/2019

Modeling of negative protein-protein interactions: methods and experiments

Protein-protein interactions (PPIs) are of fundamental importance for th...
research
01/28/2017

Deep Recurrent Neural Network for Protein Function Prediction from Sequence

As high-throughput biological sequencing becomes faster and cheaper, the...
research
01/24/2018

Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection

This paper investigates and evaluates support vector machine active lear...
research
11/03/2016

Multitask Protein Function Prediction Through Task Dissimilarity

Automated protein function prediction is a challenging problem with dist...
research
09/27/2016

Multiple protein feature prediction with statistical relational learning

High throughput sequencing techniques have highly impactedon modern biol...
research
09/04/2020

Investigation of the Cyprus donkey milk bacterial diversity by 16SrDNA high-throughput sequencing in a Cyprus donkey farm

The interest in milk originating from donkeys is growing worldwide due t...

Please sign up or login with your details

Forgot password? Click here to reset