Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

by   Xiangxie Zhang, et al.

The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset



There are no comments yet.


page 6

page 14


An Exercise Fatigue Detection Model Based on Machine Learning Methods

This study proposes an exercise fatigue detection model based on real-ti...

Deep Neural Network for Analysis of DNA Methylation Data

Many researches demonstrated that the DNA methylation, which occurs in t...

Nonparametric Feature Extraction from Dendrograms

We study nonparametric feature extraction from hierarchies. The commonly...

PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification

The PointHop method was recently proposed by Zhang et al. for 3D point c...

Deep learning of nanopore sensing signals using a bi-path network

Temporary changes in electrical resistance of a nanopore sensor caused b...

Autonomous Learning of Features for Control: Experiments with Embodied and Situated Agents

As discussed in previous studies, the efficacy of evolutionary or reinfo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.