FRnet-DTI: Convolutional Neural Networks for Drug-Target Interaction

by   Farshid Rayhana, et al.
United International University

The task of drug-target interaction prediction holds significant importance in pharmacology and therapeutic drug design. In this paper, we present FRnet-DTI, an auto encoder and a convolutional classifier for feature manipulation and drug target interaction prediction. Two convolutional neural neworks are proposed where one model is used for feature manipulation and the other one for classification. Using the first method FRnet-1, we generate 4096 features for each of the instances in each of the datasets and use the second method, FRnet-2, to identify interaction probability employing those features. We have tested our method on four gold standard datasets exhaustively used by other researchers. Experimental results shows that our method significantly improves over the state-of-the-art method on three of the four drug-target interaction gold standard datasets on both area under curve for Receiver Operating Characteristic(auROC) and area under Precision Recall curve(auPR) metric. We also introduce twenty new potential drug-target pairs for interaction based on high prediction scores. Codes Available: https: // github. com/ farshidrayhanuiu/ FRnet-DTI/ Web Implementation: http: // farshidrayhan. pythonanywhere. com/ FRnet-DTI/



There are no comments yet.


page 1

page 2

page 3

page 4


iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting

Prediction of new drug-target interactions is extremely important as it ...

Optimizing Area Under the Curve Measures via Matrix Factorization for Drug-Target Interaction Prediction

In drug discovery, identifying drug-target interactions (DTIs) via exper...

Detecting drug-drug interactions using artificial neural networks and classic graph similarity measures

Drug-drug interactions are preventable causes of medical injuries and of...

Multiple Similarity Drug-Target Interaction Prediction with Random Walks and Matrix Factorization

The discovery of drug-target interactions (DTIs) is a very promising are...

AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks

Accurate drug response prediction (DRP) is a crucial yet challenging tas...

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

Drug representations have played an important role in cheminformatics. H...

Predicting the Response to Therapy with Deep Neural Networks using Virus Sequence

Published in NIPS 2016 Workshop on Computational Biology The use of an...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The task of drug-target interaction prediction is very important in pharmacology and therapeutic drug design. This problem can be addressed in several ways. Firstly, for a already developed drug compound the task is to find new targets with which the drug might have interactions. Secondly, for a given target protein one might search for potential drugs in the library. Another way to tackle the problem is to find the possibility of interaction given a pair of drug and target protein. In this paper, we are interested in the latter kind. Experimental methods in predicting drug-protein interactions are expensive and time consuming and hence computational methods have been used extensively in the recent years Haggarty et al. (2003); Kuruvilla et al. (2002).

One of the most successful computational method in drug-target interaction prediction is docking simulations Kitchen et al. (2004)

. This method largely depends on the availability of three dimensional native structure of the target protein determined by sophisticated methods like X-Ray Crystallography. However, X-Ray Crystallography is itself a time-consuming and expensive process and thus the native structure of the targets proteins are often unavailable. These have encouraged the researchers to apply machine learning based methods to tackle the prediction problem by formulating it in a supervised learning setting

Mousavian and Masoudi-Nejad (2014).

Success of supervised learning methods largely depend on the training datasets. In a pioneering work on drug-target interaction prediction, Yamanishi et al. Yamanishi et al. (2008) proposed gold standard datasets with four sets or target proteins. Machine learning methods used in prediction of drug-target interaction often use features generated from molecular fingerprints of drugs Mousavian et al. (2016) and sequence or structure based information Rayhan et al. (2017c)

. A good number of machine learning algorithms have been used in the literature of supervised drug-target interaction prediction that includes: Support Vector Machines (SVM)

Mousavian et al. (2016), Boosting Rayhan et al. (2017c)

, Deep Learning

Chan et al. (2016), etc. One of the major obstacle in drug-target interaction prediction is due to the imbalance in the dataset. Since the known validated interaction among drug target pairs are not large, most of the approaches considers the unknown interactions as negative samples and thus they outnumber positive samples. The representation of the drug-target pair in the supervised learning dataset is another added challenge.

In the recent years, Chemo-genomic methods have received a lot of attention for identifying drug target interaction. They usually include methods like graph theory Wang et al. (2013); Chen et al. (2012), deep learning Chan et al. (2016) , machine learning Yamanishi et al. (2008); Bleakley and Yamanishi (2009) and network analysis methods Alaimo et al. (2013); Cheng et al. (2012)

. In supervised learning setting, K-Nearest Neighbor(KNN)

He et al. (2010), fuzzy logic Xiao et al. (2013), support vector machines Mousavian et al. (2016); Keum and Nam (2017) are the most commonly used classification algorithm. In Yamanishi et al. (2008), drug target interaction problem was first introduced as a supervised problem and a gold standard dataset was proposed. Later those datasets have been exhaustively used by researchers. The same authors from Yamanishi et al. (2008), applied distance based learning to association among pharmacological space of drug target interactions. A non-linear kernel fusion with regularized least square method was proposed by Hao et al. (2016).

Gönen (2012) used Chemical and genomic kernels and Bayesian factorization. Another method , DBSI (drug based similarity interface) Cheng et al. (2012) proposed two dimensional chemical-structural similarity for drug similarity. Later methods like DASPfind Ba-Alawi et al. (2016), NetCBP Chen and Zhang (2013) , SELF-BLM Keum and Nam (2017) were proposed in order to solve the problem. Bigram based features as fingerprints, extracted from position specific scoring matrix, were found very helpful solving the drug target interaction problem Mousavian et al. (2016). Most of the supervised learning methods do not exploit the structure based features because most protein targets’ three dimensional native structure are not available. Huang et al. (2016) used extremely randomized trees as classifier. The authors of that paper represented the drugs as molecular fingerprint and proteins as pseudo substitution matrix. Theses matrix were generated from its amino acid sequence information. Other relevant works include self-organizing theory Daminelli et al. (2015); Durán et al. (2017), similarity based methods Yuan et al. (2016), ensemble methods Ezzat et al. (2016, 2017). A in depth literature review was done by Chen et al. on computational methods for this particular problem Chen et al. (2015).

One of the most recent work was done by Wen et al. (2017)

, where the authors presented a model which consisted of multiple stacked RBM. The output layer consisted of 2 neurons each predicting the interaction and non-interaction probability respectively.

Chan et al. (2016) also presented a model that used deep representations for drug target interaction predictions. In our recent work, iDTI-ESBoost Rayhan et al. (2017c), we exploited evolutionary features along with structural features to predict drug protein interaction. SPIDER3, a successful secondary structural prediction tool López et al. (2017); Taherzadeh et al. (2017), was used to generate a novel set of features for supervised learning. The novel set of features include seven primary set of features. A short description of each feature set is given in S1 of the supplementary material . In that paper, two balancing methods were used to handle the imbalance ratio of the datasets, and Adaboost Freund and Schapire (1995) was used for classification.

In this paper, we propose two deep convolutional methods for feature manipulation and predicting drug target interaction. There are named FRnet-1 and FRnet-2 where FRnet-1 is used to generate 4096 features or deep representation of each datasets and FRnet-2 is used for classification using the extracted features. We use the latest version of 4 gold standard datasets with 1476 features to test our method. In the experimental results using both of our method as one, we have observed magnificent auROC and auPR metric scores and therefore we strongly claim that our method is an excellent alternative for most other proposed methods for Drug-Target-Interaction (DTI).

Figure 1: Architecture of FRnet-1

2 Methodology

This section provides a description of the methodology used in this paper: algorithmic details, datasets and performance evaluation methods.

2.1 Convolutional Models

In this paper, we propose two novel deep learning architectures for drug target interaction, each network having its own purpose and goal. Throughout the rest of the paper these two models are referred as FRnet-1 and FRnet-2. FRnet-1 is used as a auto encoder that extracts 4096 features from the given feature sets which is than fed as input to FRnet-2 for classification.

The proposed models FRnet-1 and FRnet-2 both have over millions of hyper-parameter to tune. While exhaustively tuning each of the parameter would provide the best result it is seldom done as it requires tremendous computational cost and time and risks over-fit. Four of the most effective hyper parameters is chosen Goodfellow et al. (2016) to tune and from a given range. Aggressive regularization is also employed to remove over-fitting to maximum extend.

2.2 Intuition

Recently, Deep learning methods have been receiving a lot of attention for biological applications. In the article Du et al. (2018), the authors used a model called Wide-and-Deep. Authors of Wen et al. (2017) showed impressive improvement in prediction capability using deep learning.

The authors of Wang et al. (2017) used stacked auto-encoders to further improve the problem of DTI. We closely follow that article and employ an auto encoder to extract features and a classifier for final classification. While the above mentioned papers address a separate problem of DTI using deep learning their intuition behind the architectural design wasn’t very clear.

We follow the architectural design of GoogleNet Szegedy et al. (2014) which is regarded as one of the most successful classification network Szegedy et al. (2016, 2017)

. They use a module called the inception module (see fig:

3) where convolution operation with different filters sizes are done in parallel with a pooling operation. Each of the output are then merged together as the final output of the module. This process reliefs the burden of choosing of proper filter size or operation type between convolution and pooling. Following this intuition, we design our two models which are further explained below.

2.2.1 FRnet-1

In order to perform a convolution operation, a 4D tensor with the shape

is required Abadi et al. (2016) where are the 3d representation of the features and is the input batch size. The latest version of the gold standard datasets were represented with 1476 features by Rayhan et al. (2017c) and showed very optimistic results. For convenience, a new feature with value 0 was added which extended the feature length to 1476 so that the input can be reshaped into . Here 211 and 7 are unique numbers and interchanging them has no effect and 1 represents that the input has only 1 channel. Basically the dataset is represented as a gray scale sized image.

This model consists of several convolutional layers, Max-Pooling layer and Fully-connected layer. Fig. 

1 shows the visual representation of the complete network. Input layer takes the input in the shape of and passes to a

Convolutional layer with 32 filters and 2 strides which outputs a tensor with the shape

. Here convolution means the size of the filter were

. ‘Relu’ activation, ‘SAME’ padding and ‘L2’ regularizer were used in each layer. After convolution, a Max-Pool operation is done with a kernel and stride value of 2 to reduce the tensor shape to

. This network output is then fed to four parallel processes. They are depicted in Fig. 1 from left to right. The first process is a convolution with 8 filters followed by a convolution with 64 filter. Next is a convolution with 8 filters followed by a convolution with 64 filters. Then, there is a convolution with 8 filters followed by a convolution with 32 filters and lastly, a Max-pool operation with kernel and stride 1. In the next stage, a merge operator combines the 4 network outputs on the 3rd axis of the resulting tensor with the shape . Detailed description is provided in table 1. The merged network is then fed to fully connected layers with 4096 and 2048 neurons followed by a dropout operation with value at . Finally the output layer consists of 1476 neurons each neuron representing a feature value of an instance in the dataset. The model was trained with a learning rate 0.001, ‘’ as optimizer Kingma and Ba (2014) and

as loss function. The fully connected layer with 4096 neurons is used as features to predict interaction probability using the FRnet-2 method. The model achieved

0.85% accuracy just after 3 iterations and reached over 90% after 20 iteration. Accuracy curves with respect to each iteration are described in figure 2.

(a) (b)
(c) (d)
Figure 2: Accuracy curves of FRnet-1using as loss function on four datasets: (a) enzymes (b) ion channels (ic) (c) GPCRs (d) nuclear receptors (nr).
Index Input shape Output shape after 1st ConV Output shape after 2nd ConV/Max-pool
a [X, 53, 2, 32] [X, 53, 2, 16] [X, 53, 2, 64]
b [X, 53, 2, 32] [X, 53, 2, 16] [X, 53, 2, 64]
c [X, 53, 2, 32] [X, 53, 2, 16] [X, 53, 2, 32]
d [X, 53, 2, 32] - [X, 53, 2, 32]
Tensor shape after merge operation [X, 53, 2, 192]
Table 1: Shapes of tensor after each convolution operation leading up to the merge operation. Also know as Inception Operation

The merge operation is used to retire the burden of choosing filter size from , and . In stead, the model exploits all of them and chooses the better set of features by itself. This concept was inspired form the model Szegedy et al. (2014)

which was later used to build the various versions of imageNet by the same authors

Szegedy et al. (2016, 2017). However, this model also increases computational complexity as different sized filters are used at the same time. In order to reduce some computational cost, FRnet-1 employs convolution before and after each convolution operation with different filter size to reduce computational complexity of the model. This concept was first introduced in 2013 in the article by Lin et al. (2013). They showed that convolutional operations can be used as tool to reduce channel size of a tensor. The hypothesis of Lin et al. (2013) states that converting a tensor form shape to will cost much more than converting to using a convolution using 16 filters than converting to and have the same effect on the network. This same methodology is later incorporated in FRnet-2 also.

2.2.2 FRnet-2

FRnet-2 serves for the purpose of classifying interaction probability between a given drug-target pair. Similar to FRnet-1 this model employs the inception module (details figure of the model is given in figure 3). It uses the 4096 features generated by FRnet-1 as a shaped instance. In this model, the first convolution and Max-Pool operation is kept the same as the previous method. Following those operations, the tensors are parallelly fed to 2 inception modules, one with stride size of 1 (left module of Fig. 4) and another with stride size 2.

The model merges those outputs in order to take befit of both stride size and put it through a final inception layer before connecting in to fully connected layers of 2048, 512 and finally 1 neuron for prediction. Similar to FRnet-1 this model also uses regularization, ‘’ as optimizer with as loss function with learning rate set to 0.001 and value set to 0.5 in the dropout layer.

Figure 3: Inception module from Szegedy et al. (2016, 2014, 2017)
Figure 4: Architecture of FRnet-2

2.3 Datasets

The benchmark datasets used in this article were first introduced by Yamanishi et al. (2010) in 2010 using DrugBank Wishart et al. (2008), KEGG Kanehisa et al. (2008), BRENDA Schomburg et al. (2004) and SuperTarget Günther et al. (2008) to extract information about drug-target interactions. These datasets are regarded as and have been exhaustively been used by researchers throughout the years Mousavian et al. (2016); Chan et al. (2016); Chen et al. (2015); Rayhan et al. (2017c). These datasets are publicly available at:

In this paper, an extended version of those dataset is used which consists of structural and evolutionary features. This version of dataset was first introduced in 2016 by Mousavian et al. (2016) and later further extended in 2017 by Rayhan et al. (2017c). The exacted dataset from Rayhan et al. (2017c) were used in this project for experimentation. A short description of each dataset is given in table 2.

Dataset Drugs Proteins Positive Interactions Imbalance Ratio
Enzyme 445 664 2926 99.98
Ion Chanel 210 204 1476 28.02
GPCR 223 95 635 32.36
Nuclear Receptor 54 26 90 14.6
Table 2: Description of the gold standard datasets with structural and evolutionary features Rayhan et al. (2017c)

2.4 Performance Evaluation

A wide variety of performance metrics are available to show case and compare performance of classification models. Even though the metric is sufficient enough to show the accuracy percentage of a model, in highly imbalanced dataset, such as gold standard datasets used in this experiment, that value holds little to no significance. In imbalance binary datasets, one class highly outnumbers samples of other class thus a measure of accuracy in this case makes little sense. In these type of cases, sensitivity and specificity thresholds are a very useful metric.

Assume, P denotes the number of positive samples and N denotes the negative number of samples of a dataset. Also lets assume, is the number of true negative and is true positive. Similarly represents false positive and true negative. False positive means the negative sample that the classifier wrongly predicted as positive and false negative means a positive sample that the model has classifier predicted as negative. Conversely, True positive and true negative denotes the correctly classified positive and negative samples.

Now from these, we can represent true positive rate or sensitivity rate as follows:


Sensitivity represents the ratio of correctly predicted positive samples. Precision is the definition of the positive predictive rate (PPV). It is defined as follows:


Precision shows precision as the percentage of accurate positive predictions of the classifiers. Specificity (SPC) or true negative rate is another important metric. It is defined as in Eq. 3. False positive rate (FPR) is the representation of the ratio of the number of misclassified negative samples.


Also there are two other metrics, who are independent from the imbalance ratio of the datasets, called area under curve for Receiver Operating Characteristic(auROC) and area under Precision Recall curve(auPR). Due to their ignorance towards the imbalance ratio of datasets, they have been widely used Rayhan et al. (2017c); Chan et al. (2016); Chen and Zhang (2013); Cao et al. (2012) as standard metric for comparison. Both metrics value range from 0 to 1 where a random classifier should have a score of 0.5 and a perfect classification model will have a auPR and auROC score of 1. In both cases the higher the value the better.

Another important factor is the balance of bias and variance trade off

Friedman (1997). -fold cross validation and jack knife tests are mostly used as an attempt to solve the bias-variance problem. In our experiment, we used -fold shuffled cross validation on each datasets. Each time the dataset is shuffled and spitted into 5 equal parts. Then 4 of them are used for training and the rest for testing.

3 Results and Discussion

In the experiments reported in this paper, Python v3.6, Library and Sci-kit learn Pedregosa et al. (2011) were used for the implementation. Each experiments were executed 10 times and the average result was considered. Each dataset were split into two sets, train set and test set using 5 fold cross validation.

Dataset Classifier auPR auROC
enzymes Decision Tree 0.28 0.9376
SVM 0.53 0.9010
MEBoost 0.41 0.9404
CUSBoost 0.71 0.9345
FRnet-2 0.70 0.9754
GPCR Decision Tree 0.31 0.9038
SVM 0.44 0.8859
MEBoost 0.46 0.9075
CUSBoost 0.65 0.8989
FRnet-2 0.69 0.9512
Ion Channel Decision Tree 0.29 0.933
SVM 0.40 0.8904
MEBoost 0.39 0.928
CUSBoost 0.45 0.8851
FRnet-2 0.49 0.9478
NR Decision Tree 0.46 0.8147
SVM 0.41 0.7605
MEBoost 0.23 0.9165
CUSBoost 0.71 0.8989
FRnet-2 0.73 0.9241

Table 3: A comparison of performances among FRnet-2 and other classifiers on the gold standard datasets in terms of auROC and auPR using 4096 features generated by FRnet-1.

FRnet-1 method is multilayer deep auto-encoder that uses convolution, max-pool and fully connected layers to regenerate the input as output in the final fully connected layer. For each of the datasets, the model was trained to achieve accuracy over of 95%. Due to the use of aggressive regularization, value of 0.5 in dropout and regularization in each layer using learning rate of 0.001 to avoid over fitting, the models were unable to achieve accuracy higher than 97% for any of the datasets. The first fully connected layer in the network has 4096 neurons in the output and those were used to extract 4096 features from each dataset. For a fair sake of comparison, FRnet-2 was tested with several state of the art machine learning algorithms like, Decision Tree Safavian and Landgrebe (1991), SVM Joachims (1998), MEBoost Rayhan et al. (2017b) and CUSBoost Rayhan et al. (2017a). Each of these classifiers were fed the 4096 features generated by FRnet-1. Results in terms of auPR and auROC are given Table 3. Table 3 shows that FRnet-2 is able to produce results with better auROC for all the datasets. However, in terms of auPR the results in three datasets are better than the competitor algorithms. However, for the ‘enzymes’ dataset, the performance of FRnet-2 is very close to the best performing CUSBoost. Note that other classifiers also achieved very impressive auROC and auPR score which shows the effectiveness of the features generated by FRnet-1. Therefore even though FRnet-1 is designed for feature manipulation of the four datasets mentioned in this article, it can be used as a strong feature manipulation tool on other domains as well.

Dataset Reference Classifier auPR auROC
enzymes Rayhan et al. (2017c) AdaBoost 0.68 0.9689
Rayhan et al. (2017c) Random Forest 0.43 0.9457
Mousavian et al. (2016) SVM 0.54 0.9194
FRnet-2 0.70 0.9754
GPCR Rayhan et al. (2017c) AdaBoost 0.31 0.9128
Rayhan et al. (2017c) Random Forest 0.30 0.9168
Mousavian et al. (2016) SVM 0.28 0.8720
FRnet-2 0.69 0.9512
Ion Channel Rayhan et al. (2017c) AdaBoost 0.48 0.9369
Rayhan et al. (2017c) Random Forest 0.40 0.9234
Mousavian et al. (2016) SVM 0.39 0.8890
FRnet-2 0.49 0.9512
NR Rayhan et al. (2017c) AdaBoost 0.79 0.9285
Rayhan et al. (2017c) Random Forest 0.29 0.7723
Mousavian et al. (2016) SVM 0.41 0.8690
FRnet-2 0.73 0.9241

Table 4: A performance comparison among FRnet-2 with AdaBoost, Support Vector Machine and Random Forest classifiers on the gold standard datasets auROC and auPR curve

We have compared our method with other state of the art classifiers mentioned in recent literatures such as SVM, AdaBoost and random forest. FRnet-2 shows superior performance in both metric on all the datasets except for NR which holds only 1048 instances and is the smallest dataset among the others. Comparison with other classification models are shown in Table 4. Results for the other methods were taken from the experiments reported in the literature Rayhan et al. (2017c); Mousavian et al. (2016). Note that, for each of the datasets except the nuclear receptor (NR) dataset, performance of FRnet-2 is superior to the other methods both in terms of auPR and auROC. For the NR dataset, the performance of FRnet-2 is almost similar to the best performing boosting classifier. The auPR value is second best and probably because of the fact that this dataset is highly clustered and clustered sampling techniques for balancing used in Rayhan et al. (2017c) makes it perform better in this particular case.

Dataset Enzyme GPCR ion channels nuclear receptor
Yamanishi et al. (2008) 0.904 0.8510 0.8990 0.8430
Yamanishi et al. (2010) 0.8920 0.8120 0.8270 0.8350
Cheng et al. (2012) 0.8075 0.8029 0.8022 0.7578
Gönen (2012) 0.8320 0.7990 0.8570 0.8240
Chen and Zhang (2013) 0.8251 .8034 0.8235 0.8394
Wang et al. (2013) 0.8860 0.8930 0.8730 0.8240
Mutowo et al. (2016) 0.9480 0.8990 0.8720 0.8690
Rayhan et al. (2017c) 0.9689 0.9369 0.9222 0.9285
Our Method 0.9754 0.9478 0.9512 0.9241

Table 5: Performance of FRnet-2 on the four benchmark gold datasets in terms of auROC with comparison to other state-of-the-art methods.

We have also compared our results against methods which used unsupervised and semi-supervised methods reported in the literature Cheng et al. (2012); Gönen (2012); Chen and Zhang (2013); Yamanishi et al. (2008, 2010); Wang et al. (2013). Table 5 shows a comparisons of auROC scores of other methods including supervised methods Mousavian et al. (2016); Rayhan et al. (2017c). Note that, our proposed method achieves significantly higher auROC for three datasets among four and for the NR dataset, the performance is only second best and very close to the best performing one.

Predictor enzymes GPCRs ion channels nuclear receptors
Mousavian et al. (2016) 0.54 0.39 0.28 0.41
Rayhan et al. (2017c) 0.68 0.48 0.48 0.79
FRnet-2 0.70 0.69 0.49 0.73
Table 6: Comparison of the performance of FRnet-2 on the four benchmark gold datasets from Rayhan et al. (2017c) in terms of auPR with other the state-of-the-art methods.

In the literature of imbalanced classification problems, it has been often argued that between area under Precision Recall curve (auPR) and area under Receiver Operating Curve (auROC), auPR should be considered more significant. However, only Mousavian et al. (2016) and Rayhan et al. (2017c) reported auPR scores in their paper. A comparison in terms of auPR score are given in Table 6. Here too, its interesting to note the superior performance of our proposed model on all the datasets.

We have also tested our method with input shape instead of and found similar results which concludes that changing the input shape has little to no effect on the performance on the models. Results using input shape is provided in table 7.

Dataset Classifier auPR auROC
enzymes Decision Tree 0.27 0.9299
SVM 0.54 0.9035
FRnet-2 0.70 0.9713
GPCR Decision Tree 0.32 0.9038
SVM 0.48 0.8859
FRnet-2 0.70 0.9255
Ion Channel Decision Tree 0.60 0.9235
SVM 0.52 0.8894
FRnet-2 0.50 0.9507
NR Decision Tree 0.43 0.8207
SVM 0.42 0.7588
FRnet-2 0.62 0.9134

Table 7: Performance comparison of FRnet-2 and other classifiers on the gold standard datasets in terms of auROC and auPR using 4096 features generated by FRnet-1 with input shape [X, 7, 211, 1].

Since the prediction scores with high confidence are interesting in practical applications, A list of top five the false positive interactions based on FRnet-2’s prediction score is given in S2 of the supplementary materials. These are the interactions that are known as not interacting pair but the model highly suggests other wise.

4 Conclusion

In This paper, we propose two novel deep neural net architectures, FRnet-1 and FRnet-2 where FRnet-1 aims to extract convolutional features and FRnet-2 tries to identify drug target interaction using the extracted features. From Rayhan et al. (2017c), we exploit our algorithm with datasets consisting with both structural and evolutionary features and with the help of FRnet-1 we try to generate 4096 informative features. These datasets are regarded as gold standard datasets and are exhaustively used by researchers. We have conducted extensive experiments and produced the results in term of auROC and auPR scores. In many previous literatures like Mousavian et al. (2016); Rayhan et al. (2017c)

, it was argued that in case of drug target interaction, it is more appropriate to use auPR metric over auROC as the gold standard datasets are highly imbalanced with very few interaction samples. For this reason, FRnet-2 was focused on getting a superior auPR score even by sacrificing auROC score. We also proposed 5 new possible interaction pair for each of the 4 datasets based on prediction score. Up to this moment, our proposed method outperforms other state of that art methods in 3 of the 4 benchmark datasets in auPR and auROC metric. We believe the excellent performance our method will motivate other practitioners and researchers to exploit both methods for not only drug target interaction but also in other domains.

Author Contributions

FR and SS initiated the project with the idea of using structural features. FR proposed and implemented the convolutional architecture under supervision of SS. All the methods, algorithms and results have been analyzed and verified by MSR and the other 2 authors. MSR and SS provided significant biological insights. FR prepared the manuscript with help from SS where the other authors contributed in the process and approved the final version.

Competing Interest

The authors declare that they have no competing interests.


  • Abadi et al. (2016)

    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al., 2016. Tensorflow: A system for large-scale machine learning. In: OSDI. Vol. 16. pp. 265–283.

  • Alaimo et al. (2013) Alaimo, S., Pulvirenti, A., Giugno, R., Ferro, A., 2013. Drug–target interaction prediction through domain-tuned network-based inference. Bioinformatics 29 (16), 2004–2008.
  • Ba-Alawi et al. (2016) Ba-Alawi, W., Soufan, O., Essack, M., Kalnis, P., Bajic, V. B., 2016. Daspfind: new efficient method to predict drug–target interactions. Journal of cheminformatics 8 (1), 15.
  • Bleakley and Yamanishi (2009) Bleakley, K., Yamanishi, Y., 2009. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics 25 (18), 2397–2403.
  • Cao et al. (2012) Cao, D.-S., Liu, S., Xu, Q.-S., Lu, H.-M., Huang, J.-H., Hu, Q.-N., Liang, Y.-Z., 2012. Large-scale prediction of drug–target interactions using protein sequences and drug topological structures. Analytica chimica acta 752, 1–10.
  • Chan et al. (2016) Chan, K. C., You, Z.-H., et al., 2016. Large-scale prediction of drug-target interactions from deep representations. In: Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, IEEE, pp. 1236–1243.
  • Chen and Zhang (2013) Chen, H., Zhang, Z., 2013. A semi-supervised method for drug-target interaction prediction with consistency in networks. PloS one 8 (5), e62975.
  • Chen et al. (2012) Chen, X., Liu, M.-X., Yan, G.-Y., 2012. Drug–target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems 8 (7), 1970–1978.
  • Chen et al. (2015) Chen, X., Yan, C. C., Zhang, X., Zhang, X., Dai, F., Yin, J., Zhang, Y., 2015. Drug–target interaction prediction: databases, web servers and computational models. Briefings in bioinformatics 17 (4), 696–712.
  • Cheng et al. (2012) Cheng, F., Liu, C., Jiang, J., Lu, W., Li, W., Liu, G., Zhou, W., Huang, J., Tang, Y., 2012. Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8 (5), e1002503.
  • Daminelli et al. (2015) Daminelli, S., Thomas, J. M., Durán, C., Cannistraci, C. V., 2015. Common neighbours and the local-community-paradigm for topological link prediction in bipartite networks. New Journal of Physics 17 (11), 113037.
  • Du et al. (2018) Du, Y., Wang, J., Wang, X., Chen, J., Chang, H., 2018. Predicting drug-target interaction via wide and deep learning. In: Proceedings of the 2018 6th International Conference on Bioinformatics and Computational Biology. ACM, pp. 128–132.
  • Durán et al. (2017) Durán, C., Daminelli, S., Thomas, J. M., Haupt, V. J., Schroeder, M., Cannistraci, C. V., 2017. Pioneering topological methods for network-based drug–target prediction by exploiting a brain-network self-organization theory. Briefings in Bioinformatics, bbx041.
  • Ezzat et al. (2016) Ezzat, A., Wu, M., Li, X.-L., Kwoh, C.-K., 2016. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC bioinformatics 17 (19), 509.
  • Ezzat et al. (2017) Ezzat, A., Wu, M., Li, X.-L., Kwoh, C.-K., 2017. Drug-target interaction prediction using ensemble learning and dimensionality reduction. Methods.
  • Freund and Schapire (1995)

    Freund, Y., Schapire, R. E., 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, Springer, pp. 23–37.

  • Friedman (1997)

    Friedman, J. H., 1997. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data mining and knowledge discovery 1 (1), 55–77.

  • Gönen (2012) Gönen, M., 2012. Predicting drug–target interactions from chemical and genomic kernels using bayesian matrix factorization. Bioinformatics 28 (18), 2304–2310.
  • Goodfellow et al. (2016) Goodfellow, I., Bengio, Y., Courville, A., 2016. Practical methodology. Deep Learning.
  • Günther et al. (2008) Günther, S., Kuhn, M., Dunkel, M., Campillos, M., Senger, C., Petsalaki, E., Ahmed, J., Urdiales, E. G., Gewiess, A., Jensen, L. J., et al., 2008. Supertarget and matador: resources for exploring drug-target relationships. Nucleic acids research 36 (suppl 1), D919–D922.
  • Haggarty et al. (2003) Haggarty, S. J., Koeller, K. M., Wong, J. C., Butcher, R. A., Schreiber, S. L., 2003. Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chemistry & biology 10 (5), 383–396.
  • Hao et al. (2016) Hao, M., Wang, Y., Bryant, S. H., 2016. Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique. Analytica chimica acta 909, 41–50.
  • He et al. (2010) He, Z., Zhang, J., Shi, X.-H., Hu, L.-L., Kong, X., Cai, Y.-D., Chou, K.-C., 2010. Predicting drug-target interaction networks based on functional groups and biological features. PloS one 5 (3), e9603.
  • Huang et al. (2016) Huang, Y.-A., You, Z.-H., Chen, X., 2016. A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Current protein & peptide science.
  • Joachims (1998) Joachims, T., 1998. Making large-scale svm learning practical. Tech. rep., Technical report, SFB 475: Komplexitätsreduktion in Multivariaten Datenstrukturen, Universität Dortmund.
  • Kanehisa et al. (2008) Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al., 2008. Kegg for linking genomes to life and the environment. Nucleic acids research 36 (suppl 1), D480–D484.
  • Keum and Nam (2017) Keum, J., Nam, H., 2017. Self-blm: Prediction of drug-target interactions via self-training svm. PloS one 12 (2), e0171839.
  • Kingma and Ba (2014) Kingma, D. P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • Kitchen et al. (2004) Kitchen, D. B., Decornez, H., Furr, J. R., Bajorath, J., 2004. Docking and scoring in virtual screening for drug discovery: methods and applications. Nature reviews Drug discovery 3 (11), 935–949.
  • Kuruvilla et al. (2002) Kuruvilla, F. G., Shamji, A. F., Sternson, S. M., Hergenrother, P. J., Schreiber, S. L., 2002. Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays. Nature 416 (6881), 653–657.
  • Lin et al. (2013) Lin, M., Chen, Q., Yan, S., 2013. Network in network. arXiv preprint arXiv:1312.4400.
  • López et al. (2017) López, Y., Dehzangi, A., Lal, S. P., Taherzadeh, G., Michaelson, J., Sattar, A., Tsunoda, T., Sharma, A., 2017. Sucstruct: Prediction of succinylated lysine residues by using structural properties of amino acids. Analytical Biochemistry 527.
  • Mousavian et al. (2016) Mousavian, Z., Khakabimamaghani, S., Kavousi, K., Masoudi-Nejad, A., 2016. Drug–target interaction prediction from pssm based evolutionary information. Journal of pharmacological and toxicological methods 78, 42–51.
  • Mousavian and Masoudi-Nejad (2014) Mousavian, Z., Masoudi-Nejad, A., 2014. Drug–target interaction prediction via chemogenomic space: learning-based methods. Expert opinion on drug metabolism & toxicology 10 (9), 1273–1287.
  • Mutowo et al. (2016) Mutowo, P., Bento, A. P., Dedman, N., Gaulton, A., Hersey, A., Lomax, J., Overington, J. P., 2016. A drug target slim: using gene ontology and gene ontology annotations to navigate protein-ligand target space in chembl. Journal of biomedical semantics 7 (1), 59.
  • Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al., 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12 (Oct), 2825–2830.
  • Rayhan et al. (2017a) Rayhan, F., Ahmed, S., Mahbub, A., Jani, M., Shatabda, S., Farid, D. M., Rahman, C. M., et al., 2017a. Meboost: Mixing estimators with boosting for imbalanced data classification. arXiv preprint arXiv:1712.06658.
  • Rayhan et al. (2017b) Rayhan, F., Ahmed, S., Mahbub, A., Jani, M., Shatabda, S., Farid, D. M., et al., 2017b. Cusboost: Cluster-based under-sampling with boosting for imbalanced classification. arXiv preprint arXiv:1712.04356.
  • Rayhan et al. (2017c) Rayhan, F., Ahmed, S., Shatabda, S., Farid, D. M., Mousavian, Z., Dehzangi, A., Rahman, M. S., 2017c. idti-esboost: Identification of drug target interaction using evolutionary and structural features with boosting. Scientific reports 7 (1), 17731.
  • Safavian and Landgrebe (1991) Safavian, S. R., Landgrebe, D., 1991. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 21 (3), 660–674.
  • Schomburg et al. (2004) Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D., 2004. Brenda, the enzyme database: updates and major new developments. Nucleic acids research 32 (suppl 1), D431–D433.
  • Szegedy et al. (2017)

    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A. A., 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI. Vol. 4. p. 12.

  • Szegedy et al. (2014) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2014. Going deeper with convolutions. corr abs/1409.4842 (2014).
  • Szegedy et al. (2016)

    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2818–2826.

  • Taherzadeh et al. (2017) Taherzadeh, G., Zhou, Y., Liew, A. W.-C., Yang, Y., 2017. Structure-based prediction of protein-peptide binding regions using random forest. Bioinformatics, btx614.
  • Wang et al. (2017) Wang, L., You, Z.-H., Chen, X., Xia, S.-X., Liu, F., Yan, X., Zhou, Y., 2017. Computational methods for the prediction of drug-target interactions from drug fingerprints and protein sequences by stacked auto-encoder deep neural network. In: International Symposium on Bioinformatics Research and Applications. Springer, pp. 46–58.
  • Wang et al. (2013) Wang, W., Yang, S., Li, J., 2013. Drug target predictions based on heterogeneous graph inference. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. NIH Public Access, NIH Public Access, p. 53.
  • Wen et al. (2017) Wen, M., Zhang, Z., Niu, S., Sha, H., Yang, R., Yun, Y., Lu, H., 2017. Deep-learning-based drug–target interaction prediction. Journal of proteome research 16 (4), 1401–1409.
  • Wishart et al. (2008) Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., Gautam, B., Hassanali, M., 2008. Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research 36 (suppl 1), D901–D906.
  • Xiao et al. (2013) Xiao, X., Min, J.-L., Wang, P., Chou, K.-C., 2013. icdi-psefpt: identify the channel–drug interaction in cellular networking with pseaac and molecular fingerprints. Journal of theoretical biology 337, 71–79.
  • Yamanishi et al. (2008) Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., Kanehisa, M., 2008. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24 (13), i232–i240.
  • Yamanishi et al. (2010) Yamanishi, Y., Kotera, M., Kanehisa, M., Goto, S., 2010. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26 (12), i246–i254.
  • Yuan et al. (2016) Yuan, Q., Gao, J., Wu, D., Zhang, S., Mamitsuka, H., Zhu, S., 2016. Druge-rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 32 (12), i18–i27.