SARS-Cov-2 RNA Sequence Classification Based on Territory Information

01/09/2021
by   Jingwei Liu, et al.
13

CovID-19 genetics analysis is critical to determine virus type,virus variant and evaluate vaccines. In this paper, SARS-Cov-2 RNA sequence analysis relative to region or territory is investigated. A uniform framework of sequence SVM model with various genetics length from short to long and mixed-bases is developed by projecting SARS-Cov-2 RNA sequence to different dimensional space, then scoring it according to the output probability of pre-trained SVM models to explore the territory or origin information of SARS-Cov-2. Different sample size ratio of training set and test set is also discussed in the data analysis. Two SARS-Cov-2 RNA classification tasks are constructed based on GISAID database, one is for mainland, Hongkong and Taiwan of China, and the other is a 6-class classification task (Africa, Asia, Europe, North American, South American& Central American, Ocean) of 7 continents. For 3-class classification of China, the Top-1 accuracy rate can reach 82.45% (train 60%, test=40%); For 2-class classification of China, the Top-1 accuracy rate can reach 97.35% (train 80%, test 20%); For 6-class classification task of world, when the ratio of training set and test set is 20% : 80% , the Top-1 accuracy rate can achieve 30.30%. And, some Top-N results are also given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2018

Training Set Camouflage

We introduce a form of steganography in the domain of machine learning w...
research
10/17/2022

Test-Time Training for Graph Neural Networks

Graph Neural Networks (GNNs) have made tremendous progress in the graph ...
research
08/05/2022

Construction of English Resume Corpus and Test with Pre-trained Language Models

Information extraction(IE) has always been one of the essential tasks of...
research
06/16/2022

DIALOG-22 RuATD Generated Text Detection

Text Generation Models (TGMs) succeed in creating text that matches huma...
research
10/05/2018

Automatic Detection of Arousals during Sleep using Multiple Physiological Signals

The visual scoring of arousals during sleep routinely conducted by sleep...
research
08/22/2021

FEDI: Few-shot learning based on Earth Mover's Distance algorithm combined with deep residual network to identify diabetic retinopathy

Diabetic retinopathy(DR) is the main cause of blindness in diabetic pati...
research
02/08/2021

Model Rectification via Unknown Unknowns Extraction from Deployment Samples

Model deficiency that results from incomplete training data is a form of...

Please sign up or login with your details

Forgot password? Click here to reset