Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints

by   Brighter Agyemang, et al.

Computer-Aided Drug Discovery research has proven to be a promising direction in drug discovery. In recent years, Deep Learning approaches have been applied to problems in the domain such as Drug-Target Interaction Prediction and have shown improvements over traditional screening methods. An existing challenge is how to represent compound-target pairs in deep learning models. While several representation methods exist, such descriptor schemes tend to complement one another in many instances, as reported in the literature. In this study, we propose a multi-view architecture trained adversarially to leverage this complementary behavior by integrating both differentiable and predefined molecular descriptors. Our results on empirical data demonstrate that our approach, generally, results in improved model accuracy.



There are no comments yet.


page 1

page 2

page 3

page 4


Multi-View Self-Attention for Interpretable Drug-Target Interaction Prediction

The drug discovery stage is a vital part of the drug development process...

MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

Drug target interaction (DTI) prediction is a foundational task for in s...

Toward Robust Drug-Target Interaction Prediction via Ensemble Modeling and Transfer Learning

Drug-target interaction (DTI) prediction plays a crucial role in drug di...

tFold-TR: Combining Deep Learning Enhanced Hybrid Potential Energy for Template-Based Modelling Structure Refinement

Proteins structure prediction has long been a grand challenge over the p...

Transformer Query-Target Knowledge Discovery (TEND): Drug Discovery from CORD-19

Previous work established skip-gram word2vec models could be used to min...

Drug-drug Interaction Extraction via Recurrent Neural Network with Multiple Attention Layers

Drug-drug interaction (DDI) is a vital information when physicians and p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Over the years, drug discovery has predominantly transformed from crude and serendipitous characteristics to a well-structured and rational scientific paradigm. The paradox, however, is that the growth experienced in the domain has not made the identification of drugs any easier as attested to by the high attrition rates and huge budgets typically involved in the process[1]. Traditionally, in vitro screening experiments are conducted in order to identify new interactions. However, considering that there are about synthetically feasible compounds, in silico or virtual screening alternatives are mostly used for this process [2]. The two main in silico

approaches to Drug-Target Interaction (DTI) prediction are docking simulations and machine learning approaches. Docking simulations utilize compound and target conformations to discover binding sites whereas machine learning methods are based on using features of compounds and/or targets, or their similarities.

A central aspect in applying these computational models is the featurization of compounds and biological targets into numerical vectors. This is achieved using molecular descriptors which encode properties such as bonds, valence structures, sequence information, and other related properties

[3]. Digitally, 2D structures of compounds are represented using line notations such the Simplified Molecular Input Line Entry System (SMILES) [4] whereas targets are represented using sequences, and/or their conformations when available. These digital forms are then used by toolkits (e.g. RDKit [5]) to derive the molecular descriptors.

However, there exist several different kinds of molecular descriptors in the literature with Extended Connectivity Fingerprints (ECFP) being one of the widely used descriptors. Considering that for a given compound, different descriptors produce different properties which affect model performance, this makes the choice a featurization method in model development a significant one [6, 7]. In certain cases, the performance of certain descriptors tend to be task related [8, 3]. To this end, it is usual in the domain to find different descriptors being combined as input to a model, an approach referred to as Joint Multi-Modal Learning [9]. Nonetheless, the unimodal features that are used in constructing the joint representations tend to be constructed from pre-determined descriptors.

In this study, we propose an integrated view predictive architecture that is trained adversarially [10] for predicting compound-target binding affinities. The motivation is that, these descriptors tend to complement one another in many cases and that the different modalities could provide an in-depth perspective about a sample [3]. Additionally, the enormity of the chemical space connotes that task-specific representation of compounds is a desideratum of DTI prediction research [11]. Therefore, our departure from constructing joint representations exclusively with predefined descriptors to leveraging differential feature learning and such predefined descriptors dovetails into the concept of tailoring DTI tasks to their input space.

The subsequent sections are organized as follows: section II highlights related work of our study, section III discusses our proposed model, section IV presents the experiments design our study. We also discuss the results of our experiments in section V and draw our conclusions in section VI.

Ii Related Work

The concept of utilizing multiple unimodal descriptors toward predicting bioactivity has been well studied in the literature. Since different descriptors tend to represent different properties of compounds [6], integrating these descriptors has been considered in several studies. In [12], 18 different chemical descriptors are benchmarked on DTI prediction tasks and their findings reveal that integrating multiple descriptors typically improves model performance. The works of Soufan et al [13, 14] also corroborate this observation. Hence, the combination enables the participating descriptors to create informative feature vectors.

While these studies espouse integrating multiple chemical descriptors, existing studies have mostly employed predefined feature sets, such as structure-based fingerprints and pharmacophore descriptors proposed by experts in the domain, on DTI prediction problems. Recent studies have also proposed end-to-end compound descriptor learning functions toward ensuring a closer relationship between the learning objective and the input space  [11, 15, 16]. Although several studies have approached DTI prediction as a binary classification problem [17], the nature of bioactivity is deemed to be continuous [18]. In [19], a DL model using ECFP (with diameter 4) is compared to a Molecular Graph Convolution (GraphConv) model on predicting binding affinities. In both cases, target information, in the form of Protein Sequence Composition (PSC), is combined with the descriptor of a compound for predicting the binding strengths. The work in [19]

also shows that using DL methods for predicting bioactivity generally leads to better performance than kernel- and gradient boosting machines-based methods 

[18, 20].

We propose a predictive generative adversarial network [10] architecture, for leveraging the complementary relationship between the seemingly disparate featurization methods in the domain. We show that our approach generally improves the findings in [19].

Iii Model

Iii-a Problem Formulation

We consider the problem of predicting a real-valued binding affinity between a given compound and target , . The compound is represented as a SMILES [4] string whereas the target is represented as an amino acid sequence. The target feature vector is constructed using Protein Sequence Composition (PSC), which comprises of the Amino Acid Composition (AAC), Dipeptide Composition (DC), and Tripeptide Composition (TC), using ProPy [21]. The SMILES string of is an encoding of a chemical graph structure , where is the set of atoms constituting and is a set of undirected chemical bonds between these atoms. While predefined fingerprints are computed by directly examining , descriptor learning functions like GraphConv [22] take as input and learn the numerical representation of

using backpropagation.

Iii-B Proposed Architecture

The proposed Integrated Views Predictive GAN (IVPGAN) for DTI, is shown in figure 1. Given a predefined descriptor vector of , chemical graph structure , and the PSC vector of , we optimize the following mean squared error (MSE) loss:


Thus, we form a joint representation of the query entities and by concatenating the predefined molecular descriptor, the outputs of the parameterized descriptor learning function , and the PSC feature vector of We refer to this joint representation as the Combined Input Vector (CIV). Since equation 1

is able to estimate the bias and variance of an estimator, this makes it a good fit for models which predict real-value outputs.

However, the squared loss is sensitive to the overall departure of the samples and tends to represent the output distribution by placing masses in parts of the space with low densities. In computer vision problems, this results in a blurring effect. Therefore, we follow Lotter et. al 

[10] to train our DTI prediction model adversarially by applying Adversarial Loss (AL). Specifically, we view the model accepting CIV input as generating a binding strength between a given pair, as against a vanilla prediction model. This perspective enables us to leverage techniques in Generative Adverserial Networks (GANS) [23] to mitigate the aforementioned problem of equation 1. Also, models trained adverserially are able to identify and model the structured patterns in a target distribution [10].

To this end, given a set of generated binding strengths produced by a generator and their corresponding ground truths , we construct two neighborhood alignment datasets for the adversarial training phase.

Let be the set of generated and ground truth vectors. We construct the data matrix from , , as:


such that , where and denotes array slicing up to the th element. Intuitively, each row is a vector whose elements are the magnitudes of the differences a specific generated/target datapoint has with its closest- neighbors in its corresponding distribution.

This information then serves as the feature vectors of the datapoints for adversarial training. The learning objective then is: is to generate binding strengths that are closer to their ground truths and also have a similar neighborhood structure as the target distribution. The resulting composite objective for the minimization operation then takes the form:


where the adversarial training elements are computed as:


where the distributions and of equations 4-5 are represented by the matrices constructed from the ground truth and predicted values, respectively.

is a hyperparameter that is used to control the combination of MSE and AL losses of the generator.

In what follows, we demonstrate, empirically, how this composite loss, together with the integration of the predefined descriptors and end-to-end descriptor learning, improves the model (generator) skill at predicting the binding strengths between a given compound-target pair.

Fig. 1: An integrated architecture of different views for DTI prediction using a Predictive GAN approach.

Iv Experiments Protocol

In this section, we present the design of our experiments and baselines. We also provide our source code and ancillary files at https://github.com/bbrighttaer/ivpgan.

Iv-a Datasets and Implementations

The benchmark datasets used are the Metz [24], KIBA [25], and Davis [26] datasets as provided by the work in [19]. In their work, they applied a filter threshold to each dataset for which compounds and targets with total number of samples not above the threshold are removed. The summary of these datasets are presented in table I.

Number of
of targets
Total number
of pair samples
Davis 72 442 31824 6
Metz 1423 170 35259 1
KIBA 3807 408 160296 6
TABLE I: Dataset sizes

In our experiments, we the ECFP8 [27] and Molecular Graph Convolution (GraphConv) [22] serve as representatives of predefined molecular descriptors and differentiable molecular descriptors, respectively. We used the data loading and metrics procedures provided by [19]

(with minor modifications) and implemented all models using the Pytorch framework. This includes the baseline models. All our experiments were spread over two servers described in table 


Model # Cores
# GPUs
Intel Xeon
CPU E5-2687W
48 128
1 GeForce
GTX 1080
Intel Xeon
CPU E5-2687W
24 128
4 GeForce
GTX 1080Ti
TABLE II: Simulation hardware specifications

Iv-B Baselines

We compare our proposal to the parametric models proposed in 

[19]. In our implementation of the ECFP-PSC architecture, we used the ECFP8 variant, other than the ECFP4 used in [19] and in several previous studies. In our preliminary experiments of comparing ECPF4-PSC and ECFP8-PSC architectures, ECFP8 variant mostly outperformed the ECFP4 model. This finding corroborates the view in [27] that while fewer iterations could lead to good performances in similarity and clustering tasks, activity prediction models tend to perform better with greater structure details.

Iv-C Model Training and Evaluation

In our experiments, we used a 5-fold double Cross Validation (CV) model training approach in which three main splitting schemes were used:

  • Warm split: Every drug or target in the validation set is encountered in the training set.

  • Cold-drug split: Compounds in the validation set are absent from the training set.

  • Cold-target split: Targets in the validation set are absent from the training set.

Since cold-start predictions are typically found in DTI use cases, the cold splits offer an interesting and more challenging validation schemes for the trained models.

As regards evaluation metrics, we measure the Root Mean Squared Error (RMSE) and Pearson correlation coefficient (

) on the validation sets in each CV-fold. Additionally, we measure the Concordance Index (CI) on the validation set as proposed by [18].

We follow the averaging CV approach where the reported metrics are the averages across the different folds. We also repeat the CV evaluation for different random seeds to minimize randomness. Consequently, all statistics are also averaged across such seeds. Hyperparameters were searched for on the warm split of the davis dataset using the Bayesian optimization API of scikit-optimize.

V Results and Discussion

In tables III-V

, we present the RMSE, CI, and R2 values as measured on the best trained models on each of the benchmark datasets, respectively. The standard deviation is placed beneath the mean value of each case.

Firstly, model complexity and data sizes were influential in model performances. While the IVPGAN and GraphConv-PSC models took longer training times due to the large number of parameters, ECFP8-PSC took much less time to train. The simplicity of ECFP8-PSC also ensured less overfitting with adequate regularization in the face of fewer samples in a number of cases.

Notwithstanding the foregoing observation, IVPGAN mostly achieved the best results than the baseline models, with the GraphConv-PSC being outperformed by the ECFP8-PSC model. Also, just as there exists a general trend of increasing difficulty of prediction from warm split to cold target split in [19], our implementations experienced this behavior as well albeit, with significant improvements in many cases.

Furthermore, there exist a similar trend across tables III-V. Feng et. al [19] observed that, for datasets where there are more samples of compounds than targets, cold drug split performances had the tendency to perform better than cold target splits with the reverse being true. However, in our experiments, we observed that such a relationship may be tenuous, at best, and that model skill seem to depend more on model capacity and hyperparameters used for training. In the case of warm splits results, our experiments align with the observation in [19] that they are always better than both cold drug and cold target split performances due to the variation in sample sizes. Indeed, it can be observed that cold target split proved to be the most challenging splitting scheme for all models in our experiments.

Additionally, on the KIBA dataset, although the number of compounds significantly outweighs that of targets, the difference in cold target and cold drug performances, as measured on IVPGAN, is not as pronounced as seen in the baseline models. Thus, with IVPGAN, diversity in compound features seem more necessary for richer representation of compound-target samples.

While IVPGAN mostly outperformed the baseline models, and especially so in the cold split schemes, it was mostly outperformed on the Metz dataset. Aside a possible overfitting due to sample size (also for GraphConv), the hyperparameters used may be less suited for the Metz dataset and its CV split schemes.

Datatset CV split type ECFP8 GraphConv IVPGAN
Davis Warm
Cold drug
Cold target
Metz Warm
Cold drug
Cold target
Cold drug
Cold target
TABLE III: Performance of regression on benchmark datasets measured in RMSE.
Concordance Index
Dataset CV split type ECFP8 GraphConv IVPGAN
Davis Warm
Cold drug
Cold target
Metz Warm
Cold drug
Cold target
Cold drug
Cold target
TABLE IV: Performance of regression on benchmark datasets measured in CI
Dataset CV split type ECFP8 GraphConv IVPGAN
Davis Warm
Cold drug
Cold target
Metz Warm
Cold drug
Cold target
Cold drug
Cold target
TABLE V: Performance of regression on benchmark datasets measured in .

We also conducted qualitative investigations into the prediction performances of all models train herein. The resulting scatter and joint plots could be seen at https://github.com/bbrighttaer/ivpgan. These plots align with the trends identified in tables III-V. We realized that IVPGAN is able to better model the target distributions than the baseline models in most of the cases and more so in the cold target CV schemes when there are significantly fewer targets. The realization that the GraphConv-PSC model is not able to properly model the target distribution for smaller datasets, with the ECFP8-PSC model being mostly less so, highlights the importance of our integration and generative approach to bioactivity.

The neighborhood alignment dataset in the adversarial training also enables the reduction of residues in most of the scatter plots. In summary, these results and findings demonstrate the feasibility and effectiveness of our approach to DTI prediction.

Vi Conclusion

In this study, we have discussed the use of DL models in DTI prediction and emphasized the significance of the choice of featurization schemes in model training. Using the ECFP8-PSC and GraphConv-PSC models as baselines, we have demonstrated that the IVPGAN approach results in improved model skill in most challenging use cases such as cold target splits.

Future studies could examine how coordinated multi-view representation learning mechanisms compare to the joint approach adopted in this study. In addition, other proposed GAN training techniques could be adopted to address problems associated with GAN training.


  • [1] J. A. DiMasi, R. W. Hansen, and H. G. Grabowski, “The price of innovation: New estimates of drug development costs,” Journal of Health Economics, 2003.
  • [2] P. G. Polishchuk, T. I. Madzhidov, and A. Varnek, “Estimation of the size of drug-like chemical space based on GDB-17 data,” Journal of Computer-Aided Molecular Design, 2013.
  • [3] A. S. Rifaioglu, H. Atas, M. J. Martin, R. Cetin-Atalay, V. Atalay, and T. Doğan, “Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases,” Briefings in Bioinformatics, no. January, pp. 1–36, 2018. [Online]. Available: https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bby061/5062947
  • [4] D. Weininger, “SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules,” Journal of Chemical Information and Computer Sciences, 1988.
  • [5]

    G. Landrum, “RDKit: Open-source Cheminformatics,” 2006.

  • [6] A. Cereto-Massagué, M. J. Ojeda, C. Valls, M. Mulero, S. Garcia-Vallvé, and G. Pujadas, “Molecular fingerprint similarity search in virtual screening,” Methods, vol. 71, pp. 58 – 63, 2015, virtual Screening. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1046202314002631
  • [7] T. Kogej, O. Engkvist, N. Blomberg, and S. Muresan, “Multifingerprint based similarity searches for targeted class compound selection,” Journal of Chemical Information and Modeling, 2006.
  • [8] J. Duan, S. L. Dixon, J. F. Lowrie, and W. Sherman, “Analysis and comparison of 2d fingerprints: Insights into database screening performance using eight fingerprint methods,” Journal of Molecular Graphics and Modelling, vol. 29, no. 2, pp. 157 – 170, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1093326310000781
  • [9] T. Baltrusaitis, C. Ahuja, and L. P. Morency, “Multimodal Machine Learning: A Survey and Taxonomy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423–443, 2019.
  • [10] W. Lotter, G. Kreiman, and D. Cox, “Unsupervised Learning of Visual Structure using Predictive Generative Networks,” in International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, may 2016. [Online]. Available: http://arxiv.org/pdf/1511.06380v2.pdf
  • [11] Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, “MoleculeNet: A benchmark for molecular machine learning,” Chemical Science, vol. 9, no. 2, pp. 513–530, 2018.
  • [12] R. Sawada, M. Kotera, and Y. Yamanishi, “Benchmarking a wide range of chemical descriptors for drug-target interaction prediction using a chemogenomic approach,” 2014.
  • [13]

    O. Soufan, W. Ba-Alawi, M. Afeef, M. Essack, P. Kalnis, and V. B. Bajic, “DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning,”

    Journal of Cheminformatics, 2016.
  • [14] O. Soufan, W. Ba-Alawi, M. Afeef, M. Essack, V. Rodionov, P. Kalnis, and V. B. Bajic, “Mining Chemical Activity Status from High-Throughput Screening Assays,” PLoS ONE, 2015.
  • [15] J. Gomes, B. Ramsundar, E. N. Feinberg, and V. S. Pande, “Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity,” arXiv e-prints, pp. 1–17, 2017. [Online]. Available: http://arxiv.org/abs/1703.10603
  • [16] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley, “Molecular graph convolutions: moving beyond fingerprints,” Journal of Computer-Aided Molecular Design, vol. 30, no. 8, pp. 595–608, 2016.
  • [17] I. Lee, J. Keum, and H. Nam, “DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences,” PLoS Computational Biology, vol. 15, no. 6, pp. 1–21, 2019.
  • [18] T. Pahikkala, A. Airola, S. Pietilä, S. Shakyawar, A. Szwajda, J. Tang, and T. Aittokallio, “Toward more realistic drug-target interaction predictions,” Briefings in Bioinformatics, 2015.
  • [19] Q. Feng, E. Dueva, A. Cherkasov, and M. Ester, “PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction,” arXiv e-prints, pp. 1–21, 2018. [Online]. Available: http://arxiv.org/abs/1807.09741
  • [20] T. He, M. Heidemeyer, F. Ban, A. Cherkasov, and M. Ester, “SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines,” Journal of Cheminformatics, vol. 9, no. 1, pp. 1–14, 2017.
  • [21] D. S. Cao, Q. S. Xu, and Y. Z. Liang, “Propy: A tool to generate various modes of Chou’s PseAAC,” Bioinformatics, vol. 29, no. 7, pp. 960–962, 2013.
  • [22] H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande, “Low Data Drug Discovery with One-Shot Learning,” ACS Central Science, 2017.
  • [23] I. J. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in In NIPS, 2014.
  • [24] J. T. Metz, E. F. Johnson, N. B. Soni, P. J. Merta, L. Kifle, and P. J. Hajduk, “Navigating the kinome,” Nature Chemical Biology, 2011.
  • [25] J. Tang, A. Szwajda, S. Shakyawar, T. Xu, P. Hintsanen, K. Wennerberg, and T. Aittokallio, “Making sense of large-scale kinase inhibitor bioactivity data sets: A comparative and integrative analysis,” Journal of Chemical Information and Modeling, 2014.
  • [26] M. I. Davis, J. P. Hunt, S. Herrgard, P. Ciceri, L. M. Wodicka, G. Pallares, M. Hocker, D. K. Treiber, and P. P. Zarrinkar, “Comprehensive analysis of kinase inhibitor selectivity.” Nature biotechnology, 2011.
  • [27] D. Rogers and M. Hahn, “Extended-Connectivity Fingerprints,” Journal of Chemical Information and Modeling, vol. 50, no. 5, pp. 742–754, 2010. [Online]. Available: https://doi.org/10.1021/ci100050t