Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence

11/12/2021
by   Yanyi Ding, et al.
14

SARS-CoV-2 is an upper respiratory system RNA virus that has caused over 3 million deaths and infecting over 150 million worldwide as of May 2021. With thousands of strains sequenced to date, SARS-CoV-2 mutations pose significant challenges to scientists on keeping pace with vaccine development and public health measures. Therefore, an efficient method of identifying the divergence of lab samples from patients would greatly aid the documentation of SARS-CoV-2 genomics. In this study, we propose a neural network model that leverages recurrent and convolutional units to directly take in amino acid sequences of spike proteins and classify corresponding clades. We also compared our model's performance with Bidirectional Encoder Representations from Transformers (BERT) pre-trained on protein database. Our approach has the potential of providing a more computationally efficient alternative to current homology based intra-species differentiation.

READ FULL TEXT

page 8

page 9

page 11

page 21

page 23

page 25

page 27

research
10/16/2020

Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction

Predicting protein-protein interactions (PPIs) by learning informative r...
research
11/03/2021

A new method for binary classification of proteins with Machine Learning

In this work we set out to find a method to classify protein structures ...
research
05/11/2022

RITA: a Study on Scaling Up Generative Protein Sequence Models

In this work we introduce RITA: a suite of autoregressive generative mod...
research
08/11/2023

The divergence time of protein structures modelled by Markov matrices and its relation to the divergence of sequences

A complete time-parameterized statistical model quantifying the divergen...
research
06/02/2019

Pre-training of Graph Augmented Transformers for Medication Recommendation

Medication recommendation is an important healthcare application. It is ...
research
11/11/2021

HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides

Identifying the targets of an antimicrobial peptide is a fundamental ste...

Please sign up or login with your details

Forgot password? Click here to reset