Gene Similarity-based Approaches for Determining Core-Genes of Chloroplasts

12/17/2014
by   Bassam AlKindy, et al.
0

In computational biology and bioinformatics, the manner to understand evolution processes within various related organisms paid a lot of attention these last decades. However, accurate methodologies are still needed to discover genes content evolution. In a previous work, two novel approaches based on sequence similarities and genes features have been proposed. More precisely, we proposed to use genes names, sequence similarities, or both, insured either from NCBI or from DOGMA annotation tools. Dogma has the advantage to be an up-to-date accurate automatic tool specifically designed for chloroplasts, whereas NCBI possesses high quality human curated genes (together with wrongly annotated ones). The key idea of the former proposal was to take the best from these two tools. However, the first proposal was limited by name variations and spelling errors on the NCBI side, leading to core trees of low quality. In this paper, these flaws are fixed by improving the comparison of NCBI and DOGMA results, and by relaxing constraints on gene names while adding a stage of post-validation on gene sequences. The two stages of similarity measures, on names and sequences, are thus proposed for sequence clustering. This improves results that can be obtained using either NCBI or DOGMA alone. Results obtained with this quality control test are further investigated and compared with previously released ones, on both computational and biological aspects, considering a set of 99 chloroplastic genomes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2019

Unaligned Sequence Similarity Search Using Deep Learning

Gene annotation has traditionally required direct comparison of DNA sequ...
research
04/29/2019

Author name disambiguation of bibliometric data: A comparison of several unsupervised approaches

Adequately disambiguating author names in bibliometric databases is a pr...
research
09/07/2023

Evaluation of large language models for discovery of gene set function

Gene set analysis is a mainstay of functional genomics, but it relies on...
research
05/17/2014

Identification of functionally related enzymes by learning-to-rank methods

Enzyme sequences and structures are routinely used in the biological sci...
research
03/25/2022

Feature extraction using Spectral Clustering for Gene Function Prediction

Gene annotation addresses the problem of predicting unknown associations...
research
03/16/2021

Using Grammar Patterns to Interpret Test Method Name Evolution

It is good practice to name test methods such that they are comprehensib...

Please sign up or login with your details

Forgot password? Click here to reset