Adverse drug events (ADEs) are “injuries resulting from medical intervention related to a drug” , and are distinct from medication errors (inappropriate prescription, dispensing, usage etc.) as they are caused by drugs at normal dosages. According to the National Center for Health Statistics , 48.9% of Americans took at least one prescription drug in the last 30 days, 23.1% took at least three, and 11.9% took at least five. These numbers rise sharply to 90.6%, 66.8% and 40.7% respectively, among older adults (65 years or older). This means that the potential for ADEs is very high in a variety of health care settings including inpatient, outpatient and long-term care settings. For example, in inpatient settings, ADEs can account for as many as one-third of hospital-related complications, affect up to 2 million hospital stays annually, and prolong hospital stays by 2–5 days .
The economic impact of these issues is as widespread as the various healthcare settings and can be staggering. Estimates suggest that ADEs contributed to $3.6 billion in excess healthcare costs in the US alone. Unsurprisingly, older adults are at the highest risk of being affected by an ADE, and are seven times more likely than younger persons to require hospital admission . In the US, as a large number of older adults are Medicare beneficiaries, this economic impact is borne by an already overburdened Medicare system and ultimately passed on to taxpayers and society at large. Beyond older adults, there are several other patient populations that are also vulnerable to ADEs including children, those with lower socio-economic means, those with limited access to healthcare services, and certain minorities.
Recent research has identified, somewhat surprisingly, that many of these ADEs can be attributed to very common medications  and many of them are preventable  or ameliorable . This issue motivates our long-term goal of developing accessible and robust means of identifying ADEs in a disease/drug-agnostic manner and across a variety of healthcare settings. Here, we focus on the problem of drug-drug interactions (DDIs), which are a type of ADE. An ADE is characterized as a DDI when multiple medications are co-administered and cause an adverse effect on the patient. DDIs, often caused by inadequate understanding of various drug-drug contraindications, are a major cause of hospital admissions, rehospitalizations, emergency room visits, and even death .
Identifying DDIs is an important task during drug design and testing, and regulatory agencies such as the U. S. Food and Drug Administration require large controlled clinical trials before approval. Beyond their expense and time-consuming nature, it is impossible to discover all possible interactions during such clinical trials. This necessitates the need for computational methods for DDI prediction. A substantial amount of work in DDI focuses on text-mining [24, 9] to extract DDIs from large text corpora; however, this type of information extraction does not discover new interactions, and only serves to extract in vivo or in vitro discoveries from publications.
Our goal is to discover DDIs in large drug databases by exploiting various properties of the drugs and identifying patters in drug interaction behaviors. Recent approaches consider phenotypic, therapeutic, structural, genomic and reactive properties of drugs  or their combinations  to characterize drug interactivity. We take a fresh and completely new perspective on DDI prediction through the lens of molecular images, a few examples shown in figure 1
, via deep learning. Our work is novel in the following significant ways:
we formulate DDI discovery as a link prediction problem;
we aim to perform DDI discovery directly on molecular structure images of the drugs directly, rather than on lossy, string-based representations such as SMILES strings and molecular fingerprints; and
we utilize deep learning, specifically Siamese networks  in a contrastive manner to build a DDI discovery engine that can be integrated into a drug database seamlessly.
The social and economic impacts of drug-drug interactions have also been well studied and understood. The effect of DDI on medication management and social care is studied in  and with its economic impact shown in . The impact of DDIs in the elderly patients in 6 Europen countries was documented in  and in a similar vein the study by Becker et al.  identifies that the elderly have an increased risk factor 9 times over the general population with the clinical significance of DDIs studied in . Identification of DDIs can be done by either clinical trials or in vitro and in vivo experiments but these approaches are highly labor-intensive, costly and time-consuming. Thus, a system that can mitigate these factors is highly desirable.
Drug-Drug interactions have been studied extensively both from medical and machine learning point of view. From a medical standpoint ,  and  showed the effect of important individual drugs and enzymes such as subtrates on various drug-drug interactions. The problem of DDI discovery/prediction is a pairwise classification task and thus kernel-based methods  are a natural fit since kernels are naturally suited to representing pairwise similarities. Most similarity-based methods for DDI discovery/prediction have used biomedical research literature as the underlying data source and construct NLP-based kernels from these medical documents [30, 13]. Some work has also been done on learning kernels from different types of data such as molecular and structural properties of the drugs and then using these multiple kernels to predict DDIs [10, 14].
Siamese Convolutional Network for Drug-Drug Interactions
A discriminative approach for learning a similarity metric using a Siamese architecture was introduced in  which maps the input (pair of inputs) into a target space such that the distance between the mappings is minimized in the target space for similar pair of examples and maximized in case of dissimilar examples.
We adapt the Siamese architecture for the task of link prediction where the link is whether two drugs interact or not. Since the Siamese architecture results in a measure of similarity between the pair of given inputs it can be thresholded in order to obtain a classification. We use contrastive loss , based on a distance metric (Eucledian distance in our case), to learn a parameterized function
to obtain the mapping from the input space to the target space whose minimization can result in pushing the semantically similar examples together. An important property of the loss function is that it calculated on a pair of examples. The loss function is formulated as as follows: Letand are a pair of drug images and is the label assigned to each of the pairs. The label if the pair of drug images do not interact and if the pair of drug images interact. Also, let
be the Eucledian distance between the vector of the image pairs after being processed by the underlying Siamese network andare the parameters of the function F. The contrastive loss function can then be given as
where is the Eucledian distance between the obtained outputs after the input pairs are processed by the sub-networks. Also m is a margin such that m 0 that signifies that dissimilar pairs beyond this margin will not contribute to the loss.
Figure 2 shows our complete architecture. It consists of two identical sub-networks i.e. networks having same configuration with the same parameters and weights. Each sub-network takes a gray-scale image of size 500 500 1 as input (we initially have color images that we convert to gray-scale before feeding to sub-networks as input) and consists of 4 convolutional layers with number of filters as 64, 128, 128 and 256 respectively. The kernel size for each convolutional layer is (9
9) and the activation function isrelu. The relu is a non-linear activation function is given as
. Each convolutional layer is followed by a max-pooling layer with pool size of (3
3) and a batch normalization layer. After the convolutional layers, the sub-network has 3 fully connected layers with 256, 128 and 20 neurons respectively. Thus after an image pair is processed by the Siamese sub-networks two vectors of dimension 201 are obtained. Contrastive loss is then applied to the obtained pair of vectors to obtain a distance between the input pair which can then be thresholded to obtain a prediction.
Initially we keep the threshold at 0.5 and then use precision recall curve to identify the best threshold
0.65. Note that the convolutions in the convolutional sub-network provide translational in-variance property but rotational in-variance is also important in our problem domain. This is because isomers (one of the chiral forms) of drugs are expected to react differently when interacting with a certain drug[27, 11]. For example, Fenfluramine and Dexfenfluramine are isomers of each other and where Fenfluramine interacts with Acebutolol but Dexfenfluramine does not (Figure 4). Another example is that the L-isomer of methorphan, Levomethorphan, is an opioid analgesic, while the D-isomer, Dextromethorphan, is a dissociative cough suppressant111https://en.wikipedia.org/wiki/Enantiopure_drug. We discuss our results next.
Figure 3 shows the results for using Siamese network for predicting drug-drug interactions using drug molecular structure images. Our data set consists of images of 373 drugs downloaded from the PubChem database 222https://pubchem.ncbi.nlm.nih.gov/ and generate a data set of 19936 drug pairs that interact with each other ( = 1) and 47424 drug pairs that do not interact with each other ( = 0). We optimize our Siamese network using the Adam optimization algorithm  with a learning rate of
. The best learning rate was obtained using line search. We also tried the Root Mean Square Propagation (RMSprop) optimizer that is an adaptive learning rate method but the results were not encouraging and thus we make use of the Adam optimizer. As mentioned before, we also keep an initial threshold of 0.5 to obtain the predictions after obtaining a distance between pair of drug images using the Siamese convolutional network. We divide the data set into 44457 training and 22903 testing examples. To introduce rotational in-variance we rotate the drug images by and and use them to train and test the Siamese network. After including the rotated images the data set size increases to 88914 training and 45806 testing examples. Figure 3(a) shows the accuracy after thresholding the obtained distance at 0.5 and training the network for 20 epochs. The results show that at 0.5 threshold, if the data set includes rotated drug images the network shows better results than if the data set does not include the rotated drug images. We also tried Earth movers or Wassertein distance 333https://en.wikipedia.org/wiki/Wasserstein_metric as the distance metric  for the Siamese architecture but it did not make much difference in the final results.
We then pick a best threshold using a precision-recall curve such that the best intuitive trade-off between Precision and Recall can be represented and obtain a value of 0.65 as the best threshold. Figures3(b) and 3(c) present different metrics after training the network for 20 and 50 epochs respectively. The results show that even when the number of epochs are increased the results do not vary much. We present the results with the data set including the images rotated by since it gives the best accuracy result as seen in figure 3(a). Another important thing to note here is that in our problem formulation recall is the most important factor that should be considered. The simple reason is that we do not want to miss any interaction i.e. a false negative results in much more serious consequences (fatalities in patients) than false positives (monetary losses such as new clinical trials) . Our network achieves a recall of 85% thereby showing the effectiveness of using a Siamese architecture for predicting DDIs. Along with the high recall we also obtain a high precision of 75% thereby showing that the Siamese architecture can also extract relevant DDIs.
An important factor also to consider here is the effect of the threshold on the obtained results. With the threshold of 0.5 the effect of rotational in-variance becomes evident as the network when trained with rotated images performs better than when trained without the rotated drug images. After we find the best threshold the effect of rotational in-variance becomes negligible.
Conclusion and Future Work
In this work we focus on using the molecular images of the drugs in a pairwise fashion and feeding them to a rotation-invariant Siamese architecture to predict whether two drugs interact with each other. Our evaluations on the drug images obtained from PubChem database establish the superiority of our proposed approach, which is distinct from current approaches that generally use the drug molecular structure in text format such as the Simplified Molecular Input Line Entry System (SMILES)  and SMiles ARbitrary Target Specification (SMARTS) strings .
Combining our previous work  that used different similarity measures obtained from a directed graph of known chemical reactions between drugs and enzymes, transporters and inhibitors as well as the structure of the drugs in the form of SMILES and SMARTS strings and the current work which uses images of the drug structure is a natural next step. Also refining the Siamese architecture and feeding more drug images to the network are an interesting area of future work.
-  (2018) Impact of definitive drug–drug interaction testing on medication management and patient care. Drugs-real world outcomes. Cited by: Related Work.
-  (2007) Committee on identifying and preventing medication errors. Preventing medication errors: quality chasm series, pp. 1269–1272. Cited by: Introduction.
-  (2007) Hospitalisations and emergency department visits due to drug–drug interactions: a literature review. Pharmacoepidemiology and drug safety. Cited by: Introduction, Related Work.
-  (2016) Fully-convolutional siamese networks for object tracking. In ECCV, Cited by: Related Work.
-  (2002) Drug—drug interactions in the elderly. Annals of Pharmacotherapy. Cited by: Related Work.
Signature verification using a” siamese” time delay neural network. In NIPS, Cited by: Related Work.
-  (2011) Emergency hospitalizations for adverse drug events in older americans. 365 (21), pp. 2002–2012. Note: PMID: 22111719 Cited by: Introduction.
-  (2006-10) National Surveillance of Emergency Department Visits for Outpatient Adverse Drug Events. JAMANew England Journal of MedicineJ Gen Intern MedInternational journal of biomedical science: IJBSInternational journal of applied and basic medical researchJournal of chemical information and computer sciences 296 (15), pp. 1858–1866. Cited by: Introduction.
-  (2011) Predicting adverse drug events from personal health messages. In AMIA Annual Symposium Proceedings, Cited by: Introduction.
-  (2014) Machine learning-based prediction of drug–drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. Journal of the American Medical Informatics Association. Cited by: Introduction, Related Work.
-  (2013) A review of drug isomerism and its significance. Cited by: Siamese Convolutional Network for Drug-Drug Interactions.
-  (2005) Learning a similarity metric discriminatively, with application to face verification. In CVPR (1), Cited by: 3rd item, Siamese Convolutional Network for Drug-Drug Interactions.
-  (2013) FBK-irst: a multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information. In SEM, Cited by: Related Work.
-  (2018) Drug-drug interaction discovery: kernel learning from heterogeneous similarities. Smart Health. Cited by: Introduction, Related Work, Experiments, Conclusion and Future Work.
-  (2010) U.S. Department of Health and Human Services, Office of Inspector General (OIG). Adverse Events in Hospitals: National Incidence Among Medicare Beneficiaries, Report No.: OEI-06-09-00090. Note: https://oig.hhs.gov/oei/reports/oei-06-09-00090.pdf[Online; accessed 21-April-2019] Cited by: Introduction.
-  (2005-04) Adverse drug events occurring following hospital discharge. 20 (4), pp. 317–323. Cited by: Introduction.
-  (2003) Incidence and preventability of adverse drug events among older persons in the ambulatory setting. JAMA. Cited by: Introduction.
-  (2006) Dimensionality reduction by learning an invariant mapping. In CVPR, Cited by: Siamese Convolutional Network for Drug-Drug Interactions.
-  (2012) Lecture 6d: a separate, adaptive learning rate for each connection. slides of lecture neural networks for machine learning. Technical report Technical report, Slides of Lecture Neural Networks for Machine Learning. Cited by: Experiments.
-  (2006) Drug-drug interaction between pitavastatin and various drugs via oatp1b1. Drug metabolism and disposition. Cited by: Related Work.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint. Cited by: Experiments.
-  (2015) Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Cited by: Related Work.
-  (2003) Atorvastatin reduces the ability of clopidogrel to inhibit platelet aggregation: a new drug–drug interaction. Circulation. Cited by: Related Work.
-  (2013) AZDrugMiner: an information extraction system for mining patient-reported adverse drug events in online patient forums. In ICSH, Cited by: Introduction.
-  (2014) National Center for Health Statistics, Prescription drug use in the past 30 days, by sex, race and Hispanic origin, and age: United States, selected years 1988–1994 through 2011–2014. Centers for Disease Control and Prevention. Note: https://www.cdc.gov/nchs/data/hus/2017/079.pdf[Online; accessed 21-April-2019] Cited by: Introduction.
-  (2004-05) Clarifying Adverse Drug Events: A Clinician’s Guide to Terminology, Documentation, and Reporting. Annals of Internal Medicine 140 (10), pp. 795–801. External Links: Cited by: Introduction.
-  (2006) Chiral drugs: an overview. Cited by: Siamese Convolutional Network for Drug-Drug Interactions.
-  (1996) Quantifying the clinical significance of drug—drug interactions: scaling pharmacists’ perceptions of a common interaction classification scheme. Annals of Pharmacotherapy. Cited by: Related Work.
-  (1997) 1st-class smarts patterns. In EuroMUG 97, Cited by: Conclusion and Future Work.
-  (2011) Using a shallow linguistic kernel for drug–drug interaction extraction. Journal of biomedical informatics. Cited by: Related Work.
-  (2001) The economic consequences of a drug-drug interaction. Journal of clinical psychopharmacology. Cited by: Related Work.
-  (2004) Kernel methods for pattern analysis. Cambridge Univ. Press. Cited by: Related Work.
Gated siamese convolutional neural network architecture for human re-identification. In ECCV, Cited by: Related Work.
-  (2000) Human cytochrome p-450 3a4: in vitro drug-drug interaction patterns are substrate-dependent. Drug Metabolism and Disposition. Cited by: Related Work.
-  (1988) SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Cited by: Conclusion and Future Work.
On the earth mover’s distance as a histogram similarity metric for image retrieval. In 2005 IEEE International Conference on Multimedia and Expo, Cited by: Experiments.