The 2019-nCoV has risen in the city of Wuhan in China’s Hube as an extraordinary infected human pathogen, producing critical respiratory illness and pneumonia. On 30 December 2019, three bronchoalveolar lavage samples were obtained from a patient with pneumonia of unfamiliar etiology. The complete virus genome sequence of 2019-nCoV was acquired and indicated a relationship bat SARS-like coronavirus .
At present, numerous coronavirus sequences have been published on GenBank , allowing a direct study of the virus structure that the high mutation and recombination rates of the virus make it difficult to design a wide spectrum inhibitor at a conventional targets [2, 3].
Several approaches are being evaluated. The first approach is the virus protease inhibition that prevents the virus from polypeptide filaments to be split and therefore the viral core proteins cannot be built. For instance, Nelfinavir, an HIV-1 protease inhibitor to treat HIV, was prophesied to be a likely inhibitor of 2019-nCoV principal protease by different molecular docking computational-based . The second strategy is targeting the glycosylated spike (S) protein in the fusion with the entry of the host cell , which represents the most promising target for developing new inhibitors for the target site spike (S) S-HR1 [7, 8]. Notwithstanding, it is unclear whether 2019-nCoV also holds a similar fusion and entry mechanism as that of SARS-CoV and MERS-CoV, and if true, consequently, the S-HR1 site can also serve as an important target for the development of 2019-nCoV fusion/entry inhibitors. Though, it is also unexplored if 2019-nCoV also exists other comparable fusion and entry mechanisms with SARS-CoV and MERS-CoV .
Despite this, all of these deep learning methods are sharpening on the engendering of new molecules that have not yet been clinically tested. Conversely, computational drug repurposing gives an efficient and fast approach to test drugs already available . Among this repurposing computational approach, four molecules have previously been selected to be the main candidates: Prulifloxacin, Bictegravir, Nelfinavir, and Tegobuvi . Where Prulifloxacin is a synthetic antibiotic of the fluoroquinolone class , Bictegravir is and an antiretroviral to block the enzyme integrase, used in HIV-1, capable of inserting a viral genome into a host one , Nelfinavir is protease inhibitors well used in the treatment of HIV-1 , Tegobuvi is a non-Nucleoside Reverse Transcriptase Inhibitor (NNRTI) with exhibited antiviral activity in patients with genotype 1 chronic HCV infection .
However, none of these molecules has any specific inhibition targeting for the glycosylated spike (S) proteins. Specifically, protein spike (S) is made up of two subunits: S1 and S2. The S1 subunit binds the cell receptor with its receptor-binding domain (RBD), followed by a set of conformation changes in the S2 subunit, allowing the fusion peptide to enter the cell membrane of the host cell. In the S2 region, we find another region called heptad repeat 1 (HR1) with three hydrophobic grooves that bind to a high region called heptad repeat 2 (HR2) forming a structure with six helices (6-HB), which helps to bring the two membranes closer together for the final fusion and hence the entrance. The RBD of any CoV family is a highly mutable region and not suitable for inhibition, while the HR region in the S2 subunit is conserved among various HCoVs .
Recently, several deep learning algorithms have been employed for the research of coronavirus target drugs. For example, three-dimensional modeling of the virus where 2019-nCoV sequences are translated into protein and then, within a classification network among ligand and protein as input, their interaction is screened [10, 11, 12]. While generative adversarial approaches were also used for producing novel target molecular structures [14, 13, 15] However, the main limitations of these approaches are: (i) they are mostly with supervised training approach on public datasets in which there is not any specific ground truth ligand for 2019-nCoV (ii) They are based on shallow and non-convolutional networks (currently the state of the art in many datasets). (iii) They are not directed to a specific biological mechanism that the virus uses to replicate itself or to infect the host cell.
For all these reasons, the purposes of this paper are: (i) Introduce a new artificial intelligence model that speeds up the research for the promising target ligand and is not based on complex and computationally expensive molecular docking operations. (ii) Fast screening of the main antibacterial, anticancer and antimicrobial peptides present in SATPdb toward the HR1 domain which that, as already mentioned, describes the more limited mutation spike protein site of 2019-nCoV. (ii) Moreover, introduce a new deep learning method, that is not based on a trivial supervised classification of ligand and peptide on public datasets, but on one-shot learning approach for specifically studying the glycosylated spike (S) 2019-nCoV protein.
2 Proposed analysis work-flow
The work-flow is subdivided into three stages: virus genome conversion into protein and subsequent splitting of the protein sequence within peptide filaments, text filaments peptide to image conversion, and finally the peptide comparisons with a Siamese Neural Network (SNN).
2.1 Virus genome to protein
The available genome GenBank 2019-nCoV sequence is used. The sequence was obtained from a 41-year-old man hospitalized in the Central Hospital of Wuhan on 26 December 2019. Each viral genome structure was determined by the alignment sequence of two characteristic portions of the Betacoronavirus family: a coronavirus linked with humans (SARS-CoV Tor2, GenBank accession number AY274119) and a coronavirus correlated with bats (bat SL-CoVZC45, GenBank accession number MG772933) . The genomic sequence was then converted toward the corresponding protein, where a sequence of ten protein was extracted in series for forming a protein peptide (Fig. 1).
2.2 Text peptide to image
Subsequently, the protein-peptide is converted into an image of
pixels (Fig. 2). This method is similar to DeepVariant work where the genomics sequence is transformed into an RGB pixel image and then directly processed by a state-of-art Convolutions Neural Network (CNN) for genotype prediction. This enables handle single text strings as images and to use all the advantages of CNN that currently represent the state of the art in multiple tasks on several datasets . In particular, the smallest variations in the protein sequence can be efficiently recognized by the multiple nonlinear layers of the neural network.
2.3 Peptide comparisons with a Siamese neural network
A SNN  is then trained to identify the whole 2019-nCoV protein structure versus two distinct family viruses such as Ebola  and HIV-1 ; where the equivalent genome-to-protein translation has been done. Though, this one-shot learning system reduces the use of specific datasets, enabling directly work on the available biological data. In other words, the network learns to discriminate target examples on its protein domain from other domains with completely different biological characteristics (as Ebola and HIV-1) without the explicit use of a specific dataset. Further is then applied deep SNN which i) lean generic image functionality to make inference on unknown distributions. ii) provide a valid approach that explores solutions on the unknown domain without relying on a trivial supervised training dataset (i.e. peptide ligand correspondence). iii) use several states-of-the-art CNN previously pre-trained on huge datasets.
The model is then designed with two CNN that take distinct image inputs (Fig. 3). The input of the first network is always a sequence of 2019-nCoV peptide, while the second, can accept either a peptide sequence, randomly choose from different virus families (such as Ebola and HIV-1), or the same 2019-nCoV sequence (i.e as the first network). Therefore, if the two inputs are equivalent (i.e equal 2019-nCoV protein sequence) the ground truth target utilized for training the model is one; otherwise is zero.
Even though, two distinct types of CNN have used: a shallow and a deep version. Notably, for the shallow variant, AlexNet 
is employed. Especially, a pre-trained version of AlexNet, on more than a million images from the ImageNet database, is also applied. While for the deep version is used the ResNeXt 
that is fifty layers deep. Notably, the final convolutional layer of both CNN is flattened into a vector and pair passed to a final layer that calculates their L1 metric within sigmoid output function. The model is trained with Stochastic Gradient Descent (SDG) with Nesterov accelerated gradient (NAG) and learning ratein epochs where the loss function is mean squared error (MSE). While random resized crop, random horizontal flip, and random vertical flip are made to prevent overfitting during the training phase.
. The dataset was subsequently divided into 60% train, 20% valid and 20% test respectively. The model capacity to distinguish 2019-nCoV protein sequences from those of other viral families is estimated within a sensitivity analysis (Eq1).
Where are true positive (i.e 2019-nCoV sequence correctly identified as 2019-nCoV sequence) while are the false negative (i.e 2019-nCoV sequence incorrectly identified as Ebola or HIV-1).
As previously mentioned, two different CNNs were applied (i.e AlexNet and ResNeXt) for getting the best performance on the validation and test data set. Importantly, the small eight layers AlexNet has been shown to perform in several tiny clinical datasets  and confirmed here to be the best network compared to a deeper version ResNeXt, also pre-trained on Imagenet , made up by fifty layers. Indeed, as pointed in the validation plot (Fig. 4), the pre-trained AlexNet shows a more reliable convergence correlated to the not pre-trained one and towards ResNeXt; this is probably due to the presence of small dataset to perform with large deep networks, that instead show slow convergence (i.e blu line Fig 4). The sensitivity analysis on the test set confirms what was observed during the validation phase; whereas the pre-trained AlexNet version has more prominent sensitivity compared with its not pre-trained version and ResNeXt.
|AlexNet (pretrained)||83.29 (2.55)|
|Resnext-50 (pretrained)||54.20 (3.43)|
The table shows the sensitivity of the AlexNet (pre-trained and not) model respect with a more deeper model as Resnext in terms of mean and standard deviation.
After this deep network selection phase, the AlexNet pre-trained model is chosen to make an inference on peptides from SATPdb  to find the closest peptide to targeting HR1 2019-nCoV domain site.
The inference process is performed as follows: (i) The HR1 sequence is cutting in eight sub-peptide of ten protein (Fig. 1) and then each converted into an image (Fig. 2) of pixels. (ii) Separate SATPdb peptide is further converted from text to an image of pixels (Fig. 2). (iii) Every single peptide is given in input to the first Siamese CNN network, while any of the eight HR1 sub-peptides is paid to the second Siamese CNN sequentially. iv) Finally, the final Siamese sigmoid outputs, between the target SATPdb peptide with each of the eight sub-peptides, is averaged. This inference process shows 93% affinity between (peptidyl-prolyl cis-trans isomerase) peptide  and the HR1 domain.
4 Discussion and conclusion
In this article, a novel drug repurposing approach has been explored, where an SNN has been designed to find the best matching peptide inside the HR1 region; that represents the less mutability domain of any coronavirus [7, 8]. The proposed model achieves reasonable performance with an overall sensitivity on a test set of 83.29% within two comparison models. Particularly, a shallow pre-trained version as AlexNet has been applied showed better performance than a deep version one as ResNeXt (Tab. 1). An protein sequence dataset of images was extracted from 2019-nCoV , Ebola  and HIV-1 
training the SNN to classify any protein sequences, similar to 2019-nCoV, compared to other belonging to different viruses family (i.e HIV-1, Ebola). Whereas, the lack of the training set size, justifies the greater ability of shallow networks to perform better compared to the more deep one (i.e ResNet). However, the one-shot learning approach applied here is more suitable than a supervised ligand-protein classification[10, 11, 12] because: (i) we do not need large train datasets to learn the correct ligands for 2019-nCoV, (ii) the neural network focuses the learning only in the interest 2019-nCoV protein sequence, (iii) public datasets have no training examples of ligand binding for the specific carnivorous 2019-nCoV (i.e which can create an incorrect classification).
The inference on peptides on SATPdb  had been taken, showing the confidence of 93% for peptidyl-prolyl cis-trans isomerase affinity with the HR1 2019-nCoV domain. The peptidyl-prolyl cis-trans isomerase (further known as peptidylprolyl isomerase or PPIase) is an enzyme (Fig 5) that transforms the cis and trans isomers with the amino acid proline . The two major families of PPIase are Cyps and the FK506-binding proteins (FKBPs) where the Cyps are implicated in a broad spectrum of cellular processes including cell signaling, protein folding, and protein trafficking . The affinity between HR1 and PPIase may play an essential role in the identification of PPIase inhibitors as several therapies with CsA immunosuppression drugs [37, 38]. Several scientific papers have noted that, in cellular culture, the reproduction of different CoV (included SARS-CoV, and MERS-CoV) can be inhibited by CsA therapy [39, 40, 41]. A well recognized CsA immunosuppression drug is the Sirolimus a PPIase inhibitor that has been shown to enhance results of patients with intractable H1N1 pneumonia , where recent computational studies show how Sirolimus can be also selected as a potential drug for 2019-nCoV ; confirming what has been observed in this work on HR1 and PPIase affinity. In order to provide the reproducibility of this work, the code and data has been made available to the scientific community at https://github.com/bionick87/2019-nCoV. Future work is thus still needed to confirm the association mechanism between HR1 and PPIase through molecular docking studies, therefore important for understanding biological mechanisms of PPIase on 2019-nCoV.
-  Wu, F., Zhao, S., Yu, B. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020). https://doi.org/10.1038/s41586-020-2008-3
-  Xue, X. et al. Structures of two coronavirus main proteases: implications for substrate binding and antiviral drug design. J Virol 82,2515–2527 (2008).
-  Yang, H. et al. Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol 3, e324 (2005).
-  Sayers EW et. al Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019 Jan 8;47(D1):D23-D28. doi: 10.1093/nar/gky1069. PubMed PMID: 30395293; PubMed Central PMCID: PMC6323993. 2: Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2019 Jan 8;47(D1):D94-D99. doi: 10.1093/nar/gky989. PubMed PMID: 30365038; PubMed Central PMCID: PMC6323954.
-  Xu, Z. et al. Nelfinavir was predicted to be a potential inhibitor of 2019-nCov main protease by an integrative approach combining homology modelling, molecular docking and binding free energy calculation. Pharmacology and Toxicology 264 (2020).
-  Wrapp Daniel, Wang Nianshuang, Corbett Kizzmekia S, Goldsmith Jory A., Hsieh Ching-Lin, Abiona Olubukola, Graham Barney S., McLellan Jason S. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation, Science, vol 367 n 6483, p 1260, 2020
-  Liu, S. et. al Interaction between heptad repeat 1 and 2 regions in spike protein of SARS-associated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors. Lancet 363, 938–947 (2004).Return to ref 4 in article
-  Xia S et. al pan-coronavirus fusion inhibitor targeting the HR1 domain of human coronavirus spike vol 5, n 4, 2019, Science Advances, 10.1126/sciadv.aav4580
-  Xia, S. et al, Fusion mechanism of 2019-nCoV and fusion inhibitors targeting HR1 domain in spike protein. Cell Mol Immunol (2020). https://doi.org/10.1038/s41423-020-0374-2
-  Zhang, H., Saravanan K.M., Yang, Y., Hossain M.T., Li, J., Ren, X., Wei, Y. Deep Learning Based Drug Screening for Novel Coronavirus 2019-nCov. Preprints 2020, 2020020061 (doi: 10.20944/preprints202002.0061.v1).
Rishikesh Magar, Prakarsh Yadav, Amir Barati Farimani, Potential Neutralizing Antibodies Discovered for Novel Corona Virus Using Machine Learning,2003.08447, arXiv, 2020
-  Markus Hofmarcher et. al, Large-scale ligand-based virtual screening for SARS-CoV-2 inhibitors using deep neural networks, 2004.00979,arXiv,2020
Daniil Polykovskiy, Alexander ZhebrakDmitry, VetrovYan IvanenkovVladimir, Aladinskiy Polina, Mamoshina Marine, Bozdaganyan Alexander, Aliper Alex, ZhavoronkovArtur Kadurin, Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. 2018 Oct 1;15(10):4398-4405. doi:10.1021/acs.molpharmaceut.8b00839. Epub 2018 Sep 19.
-  Zhavoronkov Alex et. al, (2020): Potential COVID-2019 3C-like Protease Inhibitors Designed Using Generative Deep Learning Approaches. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.11829102.v2
-  Kaifu Gao, Duc Duy Nguyen, Rui Wang, Guo-Wei Wei, Machine intelligence design of 2019-nCoV drugs, bioRxiv 2020.01.30.927889; doi: https://doi.org/10.1101/2020.01.30.927889
-  Karaman, B., Sippl, W. Computational Drug Repurposing: Current Trends. Curr. Med. Chem. 26, 5389–5409 (2019).
-  Li, Y. et. al, Therapeutic Drugs Targeting 2019-nCoV Main Protease by High-Throughput Screening. Pharmacology and Toxicology (2020).
-  Nelson JM, Chiller TM, Powers JH, Angulo FJ (2007). ”Food Safety: Fluoroquinolone‐Resistant Campylobacter Species and the Withdrawal of Fluoroquinolones from Use in Poultry: A Public Health Success Story”. Clinical Infectious Diseases. 44 (7): 977–80. doi: 10.1086/512369. PMID 17342653
-  Tsiang M et al, ”Antiviral Activity of Bictegravir (GS-9883), a Novel Potent HIV-1 Integrase Strand Transfer Inhibitor with an Improved Resistance Profile”. Antimicrobial Agents and Chemotherapy. 60 (12): 7086–7097. doi: 10.1128/AAC.01474-16. PMC 5118987. PMID 27645238, December 2016.
-  Zhang KE et. al, ”Circulating metabolites of the human immunodeficiency virus protease inhibitor nelfinavir in humans: structural identification, levels in plasma, and antiviral activities”. Antimicrob. Agents Chemother. 45 (4): 1086–93. doi: 10.1128/AAC.45.4.1086-1093.2001. PMC 90428. PMID 11257019, April 2001.
-  Hebner CM et. al, The HCV non-nucleoside inhibitor Tegobuvir utilizes a novel mechanism of action to inhibit NS5B polymerase function, PLoS One. 2012;7(6):e39163. doi: 10.1371/journal.pone.0039163. Epub 2012 Jun 13.
-  Sandeep Singh et. al, SATPdb: a database of structurally annotated therapeutic peptides, Nucleic Acids Res. 2016 Jan 4; 44(Database issue): D1119–D1126, doi: 10.1093/nar/gkv1114, 2015
-  ADAM: A comprehensive antimicrobial peptide database with sequence-structure relationships, http://bioinformatics.cs.ntou.edu.tw/ADAM/adam_info.php?f=ADAM_3143
-  Volchkov VE, Volchkova VA, Chepurnov AA, Blinov VM, Dolnik O, Netesov SV, Feldmann H. Characterization of the L gene and 5’ trailer region of Ebola virus. J Gen Virol. 1999 Feb;80 ( Pt 2):355-362. doi: 10.1099/0022-1317-80-2-355. PubMed PMID: 10073695.
-  Frank Kirchhoff, et al, A novel proviral clone of HIV-2: Biological and phylogenetic relationship to other primate immunodeficiency viruses, Volume 177, Issue 1, July 1990, Pages 305-311
-  Daniel Wrapp et al, Cryo-EM Structure of the 2019-nCoV Spike in the Prefusion Conformation, bioRxiv preprint, doi: https://doi.org/10.1101/2020.02.11.944462.
-  Gurjit S. Randhawa, Maximillian P.M. Soltysiak, Hadi El Roz, Camila P.E. de Souza, Kathleen A. Hill, Lila Kari bioRxiv 2020.02.03.932350; doi: https://doi.org/10.1101/2020.02.03.932350
-  Poplin, R., Chang, P., Alexander, D. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36, 983–987 https://doi.org/10.1038/nbt.4235, 2018
-  Asifullah Khan, Anabia Sohail, Umme Zahoora, Aqsa Saeed Qureshi, A Survey of the Recent Architectures of Deep Convolutional Neural Networks, CoRR, abs/1901.06032, http://arxiv.org/abs/1901.06032, 2019
-  Koch Gregory, Zemel Richard, Zemel Richard, Siamese Neural Networks for One-shot Image Recognition, https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf, 2015
-  Krizhevsky Alex, Sutskeve Ilya, Hinton Geoffrey E., ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, 2012
-  Deng J., Dong W, Socher R., Li L.-J., Li K., Fei-Fei, L., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009
-  Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He, Aggregated Residual Transformations for Deep Neural Networks, CoRR, abs/1611.05431, http://arxiv.org/abs/1611.05431, 2016
-  Savioli, N., Grisan, E., Visentin, S. et al. Real-time diameter of the fetal aorta from ultrasound. Neural Comput Applic,https://doi.org/10.1007/s00521-019-04646-3, (2019).
-  K. Lang, F.X. Schmid, G. Fischer Catalysis of protein folding by prolyl isomerase Nature, 32, pp. 268-270, 1987
-  N.V. Naoumov Cyclophilin inhibition as potential therapy for liver diseases J. Hepatol., 61 (2014), pp. 1166-1174
-  S.D. Frausto, E. Lee, H. Tang Cyclophilins as modulators of viral replication Viruses, 5 (2013), pp. 1684-1701
-  S. Hopkins, P.A. Gallay The role of immunophilins in viral infection Biochim. Biophys. Acta, 1850 (2015), pp. 2103-2110
-  A.H. de Wilde, V.S. Raj, D. Oudshoorn, T.M. Bestebroer, S. van Nieuwkoop, R.W. Limpens, C.C. Posthuma, Y. van der Meer, M. Barcena, B.L. Haagmans, E.J. Snijder, B.G. van den Hoogen MERS-coronavirus replication induces severe in vitro cytopathology and is strongly inhibited by cyclosporin A or interferon-alpha treatment J. Gen. Virol., 94 (2013), pp. 1749-1760
-  S. Pfefferle et. al, The SARS-coronavirus-host Interactome: identification of cyclophilins as target for pan-coronavirus inhibitors PLoS Pathog., 7 (2011), p. e1002331
-  Y. Tanaka, Y. Sato, S. Osawa, M. Inoue, S. Tanaka, T. Sasaki Suppression of feline coronavirus replication in vitro by cyclosporin A Vet. Res., 43 (2012), p. 41
-  Wang, C. H. et al. Adjuvant treatment with a mammalian target of rapamycin inhibitor, sirolimus, and steroids improves outcomes in patients with severe H1N1 pneumonia and acute respiratory failure. Crit. Care Med. 42, 313–321 (2014).
-  Zhou, Y., Hou, Y., Shen, J. et al. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov 6, 14,https://doi.org/10.1038/s41421-020-0153-3, (2020).