Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm

04/19/2019
by   Jingwei Liu, et al.
0

Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family genome sequences play important roles in describing specific microbial clades within species, a framework of support vector machine (SVM) based microbial species classification with N-best algorithm is constructed to classify the centroid marker genome fragments randomly generated from marker genome sequences on MetaRef. A time series feature extraction method is proposed by segmenting the centroid gene sequences and mapping into different dimensional spaces. Two ways of data splitting are investigated according to random splitting fragments along genome sequence (DI) , or separating genome sequences into two parts (DII).Two strategies of fragments recognition tasks, dimension-by-dimension and sequence--by--sequence, are investigated. The k-mer size selection, overlap of segmentation and effects of random split percents are also discussed. Experiments on 12390 maker genome sequences belonging to marker families of 17 species from MetaRef show that, both for DI and DII in dimension-by-dimension and sequence-by-sequence recognition, the recognition accuracy rates can achieve above 28% in top-1 candidate, and above 91% in top-10 candidate both on training and testing sets overall.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Bayesian Weighted Triplet and Quartet Methods for Species Tree Inference

Inference of the evolutionary histories of species, commonly represented...
research
05/12/2022

SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

A critical step of genome sequence analysis is the mapping of sequenced ...
research
12/18/2017

Phylogenomics with Paralogs

Phylogenomics heavily relies on well-curated sequence data sets that con...
research
07/02/2019

Machine Learning based Prediction of Hierarchical Classification of Transposable Elements

Transposable Elements (TEs) or jumping genes are the DNA sequences that ...
research
06/10/2014

Identification of Orchid Species Using Content-Based Flower Image Retrieval

In this paper, we developed the system for recognizing the orchid specie...
research
12/12/2019

The Metagenomic Binning Problem: Clustering Markov Sequences

The goal of metagenomics is to study the composition of microbial commun...
research
01/03/2021

Segmentation and genome annotation algorithms

Segmentation and genome annotation (SAGA) algorithms are widely used to ...

Please sign up or login with your details

Forgot password? Click here to reset