Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

11/01/2020
by   Xiangxie Zhang, et al.
0

The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 14

03/07/2018

An Exercise Fatigue Detection Model Based on Machine Learning Methods

This study proposes an exercise fatigue detection model based on real-ti...
08/02/2018

Deep Neural Network for Analysis of DNA Methylation Data

Many researches demonstrated that the DNA methylation, which occurs in t...
12/21/2018

Nonparametric Feature Extraction from Dendrograms

We study nonparametric feature extraction from hierarchies. The commonly...
02/09/2020

PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification

The PointHop method was recently proposed by Zhang et al. for 3D point c...
05/08/2021

Deep learning of nanopore sensing signals using a bi-path network

Temporary changes in electrical resistance of a nanopore sensor caused b...
09/15/2020

Autonomous Learning of Features for Control: Experiments with Embodied and Situated Agents

As discussed in previous studies, the efficacy of evolutionary or reinfo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.