BASiNETEntropy: an alignment-free method for classification of biological sequences through complex networks and entropy maximization

03/24/2022
by   Murilo Montanini Breve, et al.
0

The discovery of nucleic acids and the structure of DNA have brought considerable advances in the understanding of life. The development of next-generation sequencing technologies has led to a large-scale generation of data, for which computational methods have become essential for analysis and knowledge discovery. In particular, RNAs have received much attention because of the diversity of their functionalities in the organism and the discoveries of different classes with different functions in many biological processes. Therefore, the correct identification of RNA sequences is increasingly important to provide relevant information to understand the functioning of organisms. This work addresses this context by presenting a new method for the classification of biological sequences through complex networks and entropy maximization. The maximum entropy principle is proposed to identify the most informative edges about the RNA class, generating a filtered complex network. The proposed method was evaluated in the classification of different RNA classes from 13 species. The proposed method was compared to PLEK, CPC2 and BASiNET methods, outperforming all compared methods. BASiNETEntropy classified all RNA sequences with high accuracy and low standard deviation in results, showing assertiveness and robustness. The proposed method is implemented in an open source in R language and is freely available at https://cran.r-project.org/web/packages/BASiNETEntropy.

READ FULL TEXT

page 1

page 6

research
10/09/2021

Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres

Musical genre's classification has been a relevant research topic. The a...
research
03/06/2014

A Novel Method for Comparative Analysis of DNA Sequences by Ramanujan-Fourier Transform

Alignment-free sequence analysis approaches provide important alternativ...
research
09/26/2020

ProDOMA: improve PROtein DOMAin classification for third-generation sequencing reads using deep learning

Motivation: With the development of third-generation sequencing technolo...
research
12/03/2022

iEnhancer-ELM: Improve Enhancer Identification by Extracting Multi-scale Contextual Information based on Enhancer Language Models

Motivation: Enhancers are important cis-regulatory elements that regulat...
research
09/23/2022

BioKlustering: a web app for semi-supervised learning of maximally imbalanced genomic data

Summary: Accurate phenotype prediction from genomic sequences is a highl...
research
06/30/2022

Classification of network topology and dynamics via sequence characterization

Sequences arise in many real-world scenarios; thus, identifying the mech...
research
01/24/2022

Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa

In ecology it has become common to apply DNA barcoding to biological sam...

Please sign up or login with your details

Forgot password? Click here to reset