Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

11/01/2020
by   Xiangxie Zhang, et al.
0

The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset

READ FULL TEXT

page 6

page 14

research
03/07/2018

An Exercise Fatigue Detection Model Based on Machine Learning Methods

This study proposes an exercise fatigue detection model based on real-ti...
research
03/10/2023

Resource saving taxonomy classification with k-mer distributions and machine learning

Modern high throughput sequencing technologies like metagenomic sequenci...
research
03/27/2018

Analyzing DNA Hybridization via machine learning

In DNA computing, it is impossible to decide whether a specific hybridiz...
research
04/30/2023

Predictability of Machine Learning Algorithms and Related Feature Extraction Techniques

This thesis designs a prediction system based on matrix factorization to...
research
09/30/2022

RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery

The extraction of sequence patterns from a collection of functionally li...
research
07/11/2012

Efficient Prediction of DNA-Binding Proteins Using Machine Learning

DNA-binding proteins are a class of proteins which have a specific or ge...

Please sign up or login with your details

Forgot password? Click here to reset