Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples

05/23/2018
by   Maryam Sepehri, et al.
0

Public repositories for genome and proteome annotations, such as the Gene Ontology (GO), rarely stores negative annotations, i.e. proteins not possessing a given function. This leaves undefined or ill defined the set of negative examples, which is crucial for training the majority of machine learning methods inferring proteins functions. Automated techniques to choose reliable negative proteins are thereby required to train accurate function prediction models. This study proposes the first extensive analysis of the temporal evolution of protein annotations in the GO repository. Novel annotations registered through the years have been analyzed to verify the presence of annotation patterns in the GO hierarchy. Our research supplied fundamental clues about proteins likely to be unreliable as negative examples, that we verified into a novel algorithm of our own construction, validated on two organisms in a genome wide fashion against approaches proposed to choose negative examples in the context of functional prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2018

Combining Cost-Sensitive Classification with Negative Selection for Protein Function Prediction

Motivation: Computational methods play a central role in annotating the ...
research
02/08/2023

DDeMON: Ontology-based function prediction by Deep Learning from Dynamic Multiplex Networks

Biological systems can be studied at multiple levels of information, inc...
research
01/03/2021

Segmentation and genome annotation algorithms

Segmentation and genome annotation (SAGA) algorithms are widely used to ...
research
08/07/2023

Biomedical Knowledge Graph Embeddings with Negative Statements

A knowledge graph is a powerful representation of real-world entities an...
research
05/09/2012

Using the Gene Ontology Hierarchy when Predicting Gene Function

The problem of multilabel classification when the labels are related thr...
research
05/21/2023

Gene Set Summarization using Large Language Models

Molecular biologists frequently interpret gene lists derived from high-t...
research
07/13/2022

Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification

The availability of genomic data has grown exponentially in the last dec...

Please sign up or login with your details

Forgot password? Click here to reset