Deep learning forward and reverse primer design to detect SARS-CoV-2 emerging variants
Surges that have been observed at different periods in the number of COVID-19 cases are associated with the emergence of multiple SARS-CoV-2 (Severe Acute Respiratory Virus) variants. The design of methods to support laboratory detection are crucial in the monitoring of these variants. Hence, in this paper, we develop a semi-automated method to design both forward and reverse primer sets to detect SARS-CoV-2 variants. To proceed, we train deep Convolution Neural Networks (CNNs) to classify labelled SARS-CoV-2 variants and identify partial genomic features needed for the forward and reverse Polymerase Chain Reaction (PCR) primer design. Our proposed approach supplements existing ones while promoting the emerging concept of neural network assisted primer design for PCR. Our CNN model was trained using a database of SARS-CoV-2 full-length genomes from GISAID and tested on a separate dataset from NCBI, with 98% accuracy for the classification of variants. This result is based on the development of three different methods of feature extraction, and the selected primer sequences for each SARS-CoV-2 variant detection (except Omicron) were present in more than 95 % of sequences in an independent set of 5000 same variant sequences, and below 5 % in other independent datasets with 5000 sequences of each variant. In total, we obtain 22 forward and reverse primer pairs with flexible length sizes (18-25 base pairs) with an expected amplicon length ranging between 42 and 3322 nucleotides. Besides the feature appearance, in-silico primer checks confirmed that the identified primer pairs are suitable for accurate SARS-CoV-2 variant detection by means of PCR tests.
READ FULL TEXT