Multi-View Self-Attention for Interpretable Drug-Target Interaction Prediction

by   Brighter Agyemang, et al.

The drug discovery stage is a vital part of the drug development process and forms part of the initial stages of the development pipeline. In recent times, machine learning-based methods are actively being used to model drug-target interactions for rational drug discovery due to the successful application of these methods in other domains. In machine learning approaches, the numerical representation of molecules is vital to the performance of the model. While significant progress has been made in molecular representation engineering, this has resulted in several descriptors for both targets and compounds. Also, the interpretability of model predictions is a vital feature that could have several pharmacological applications. In this study, we propose a self-attention-based, multi-view representation learning approach for modeling drug-target interactions. We evaluated our approach using three large-scale kinase datasets and compared six variants of our method to 16 baselines. Our experimental results demonstrate the ability of our method to achieve high accuracy and offer biologically plausible interpretations using neural attention.



page 31

page 32

page 33


Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction

Potential Drug-Drug Interaction(DDI) occurring while treating complex or...

Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints

Computer-Aided Drug Discovery research has proven to be a promising dire...

An Interpretable Framework for Drug-Target Interaction with Gated Cross Attention

In silico prediction of drug-target interactions (DTI) is significant fo...

Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction

Predicting drug-target interactions (DTI) is an essential part of the dr...

Multi-View Substructure Learning for Drug-Drug Interaction Prediction

Drug-drug interaction (DDI) prediction provides a drug combination strat...

DTI-SNNFRA: Drug-Target interaction prediction by shared nearest neighbors and fuzzy-rough approximation

In-silico prediction of repurposable drugs is an effective drug discover...

Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

Recently Bosc et al. (J Cheminform 11(1): 4, 2019), published an article...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the pharmaceutical sciences, drug discovery is the process of elucidating the roles of compounds in bioactivity for developing novel drugs. The drug discovery stage is a vital part of the drug development process and forms part of the initial stages of the development pipeline. In recent times, traditional in vivo and in vitro

methods for analyzing bioactivities have been enhanced with automated methods such as large-scale High-Throughput Screening (HTS). The automation is motivated by the quest to reduce the cost and time-to-market challenges that are associated with the drug development process. The cost of developing a single drug is estimated to be 1.8 billion US dollars and could take 10-15 years to complete 

Hopkins (2009). While HTS provides a better alternative to wet-lab experiments, it is time-consuming (takes about 2-3 years) Lee et al. (2019) and requires advanced chemogenomic libraries. Also, with HTS, an exhaustive screening of the known human proteome and the synthetically feasible compounds is intractable Rifaioglu et al. (2018); Polishchuk et al. (2013). Additionally, HTS has a high failure rate Doman et al. (2002).

Lately, the availability of large-scale chemogenomic and pharmacological data (such as DrugBank Knox et al. (2011), KEGG Kanehisa et al. (2012), STITCH Szklarczyk et al. (2016), and ChemBL Bento et al. (2014), Davis Davis et al. (2011), KIBA Tang et al. (2014), PubChem Kim et al. (2016)), coupled with advances in computational resources and algorithms have engendered the growth of the in silico (computer-based) Virtual Screening (VS) domain. In silico methods have the potential to address the challenges mentioned above that plague HTS due to their ability to analyze assay data, unmask inherent relationships, and exploit such latent information for drug discovery tasks Schierz (2009).

In VS, data-driven models are used to examine and predict Drug-Target Interactions (DTI) to systematically guide subsequent HTS or in vitro validation methods. DTI research using VS methods has applications in drug side-effects studies Wen et al. (2017) and could be a key contributor in developing personalized medications Shin et al. (2019), and in drug-repurposing Ezzat et al. (2017). Also, it is worth noting that the use of in silico methods to optimize the drug development process could reduce healthcare costs and encourage accessibility of healthcare services.

Consequently, there are several in silico proposals in the literature about DTI prediction Rifaioglu et al. (2018). On account of data usage, structure-based methods, ligand-based approaches, and proteochemometric Modeling (PCM) constitute the taxonomy of existing in silico DTI studies. Structure-based methods use the 3D conformations of targets and compounds for bioactivity studies. Docking simulations are well-known instances of structure-based methods. Since the 3D conformations of several targets, such as G-Protein Coupled Receptors (GPCR) and Ion Channels (IC), are unknown, structure-based methods are limited in their application. They are also computationally expensive since a protein could assume multiple conformations depending on its rotatable bonds Rifaioglu et al. (2018). Ligand-based methods operate on the assumption that similar compounds would interact with similar targets and vice-versa, tersely referred to as ‘guilt-by-association.’ Hence, ligand-based methods perform poorly when a target has a few known binding ligands (). The same applies in reverse.

On the other hand, PCM or chemogenomic methods, proposed in Lapinsh et al. (2001), model interactions using a drug (compound)-target (protein) pair as input. Since PCM methods do not suffer from the drawbacks of ligand-based and structure-based methods, there have been many studies in using such chemogenomic methods to study DTIs Lapinsh et al. (2005); Cortés-Ciriano et al. (2015); Manoharan et al. (2015). Also, PCM methods can use a wide range of drug and target representations. Qiu et al. provide a well-documented growth of the PCM domain Qiu et al. (2017).

As regards computational methodologies, Chen et al. categorize existing models for DTI prediction into Network-based, Machine Learning (ML)-based, and other models Chen et al. (2016). Network-based methods approach the DTI prediction task using graph-theoretic algorithms where the nodes represent drugs and targets while the edges model the interactions between the nodes Yamanishi et al. (2008)

. As a corollary, the DTI prediction task becomes a link prediction problem. While network-based methods can work well even on datasets with few samples, they do not generalize to samples out of the training set, among other shortcomings. ML methods tackle the DTI prediction problem by training a parametric or non-parametric model iteratively with a finite independent and identically distributed training set made up of drug-target pairs using supervised, unsupervised, or semi-supervised algorithms. Probabilistic Matrix Factorization (MF) of an interaction matrix and certain forms of similarity-based methods also exist in the domain 

Cobanoglu et al. (2013); Chen and Zeng (2013).

Rifaioglu et al., in their analysis of recent progress of in silico methods, show that researchers in the domain Rifaioglu et al. (2018) are increasingly studying supervised ML methods. In this context, similarity-based and feature-based methods have been the main ML approaches. Similarity-based methods leverage the drug-drug, target-target, and drug-target similarities to predict new interactions Shi et al. (2015); Perualila-Tan et al. (2016); Pahikkala et al. (2015a)

. Feature-based methods represent each drug or target using a numerical vector, which may reflect the entity’s physicochemical and molecular properties. These feature vectors are used to train an ML model to predict unknown interactions. Sachdev et al. provide a thorough discussion of the feature-based DTI methods 

Sachdev and Gupta (2019). Additionally, some proposals combine feature-based and similarity-based methods to model interactions Liu et al. (2015); He et al. (2017)

. Due to the recent success of the Deep Learning (DL) domain, a form of ML, in areas such as computer vision 

Xu et al. (2015)

and Natural Language Processing (NLP) 

Bahdanau et al. (2014), recent feature-based approaches have mainly been DL algorithms Wallach et al. (2015); Kearnes et al. (2016); Gomes et al. (2017); Altae-Tran et al. (2017); Lee et al. (2019); Shin et al. (2019).

In feature-based methods, the construction of numerical vectors from the digital forms of drugs or targets is significant. This process is called featurization. The 2D structure of a compound can be represented using a line notation algorithm, such as the Simplified Molecular Input Line Entry System (SMILES) Weininger (1988). Likewise, a target can be encoded using amino-acid sequencing. The compound and target features can then be computed using libraries such as RDKit Landrum (2006) and ProPy Cao et al. (2013), respectively. While Wen et al. draw a line between descriptors and fingerprints, we refer to both as descriptors herein since they can be composed to form molecular representations Wen et al. (2017).

While significant progress has been made in molecular representation engineering, this has resulted in several descriptors for both targets and compounds Todeschini and Consonni (2010); Rifaioglu et al. (2018); Mahmud et al. (2020)

. Since the choice of descriptors or features significantly affects model skill, there is an inexorable dilemma for researchers in feature selection 

Cereto-Massagué et al. (2015); Kogej et al. (2006). In some instances, the performance of molecular descriptors tends to be task related Duan et al. (2010) and offer complementary behaviors Sawada et al. (2014); Soufan et al. (2016); Mahmud et al. (2020). Therefore, the integration of these predefined descriptors is common and espoused by researchers to construct joint molecular views Baltrusaitis et al. (2019); Rifaioglu et al. (2018). Although these descriptors tend to provide domain-related information, their predefined nature means they are unable to establish a closer relationship between the input and output space concerning the task at hand.

Indeed, several algorithms have been proposed to learn compound and target features directly from their sequences, 2D or 3D forms over the past few years Duvenaud et al. (2015); Wallach et al. (2015); Kearnes et al. (2016); Gomes et al. (2017); Wu et al. (2018); Tsubaki et al. (2019); Lee et al. (2019); Shin et al. (2019)

using backpropagation. It has been shown that DTI models constructed in such manner usually outperform predefined descriptors or provide competitive results 

Kearnes et al. (2016); Feng et al. (2018); Agyemang et al. (2019). Nonetheless, the proliferation of these end-to-end descriptor learning methods only exacerbates the dilemma mentioned above since these studies also demonstrate the capabilities of predefined methods such as the Extended Connectivity Fingerprint (ECFP) Rogers and Hahn (2010) method.

In another vein, most of the existing DTI studies in the literature have formulated the DTI prediction task as a Binary Classification (BC) problem. However, the nature of bioactivity is continuous. Also, DTI depends on the concentration of the two query molecules and their intermolecular associations Pahikkala et al. (2015b). Indeed, it rare to have a ligand that binds to only one target Rifaioglu et al. (2018). While the binary classification approach provides a uniform approach to benchmark DTI proposals in the domain using the GPCR, IC, Enzymes (E), and Nuclear Receptor (NR) datasets of Yamanishi et al. (2008), treating DTI prediction as a binding affinity prediction problem leads to the construction of more realistic datasets Öztürk et al. (2018); Tang et al. (2014). Accordingly, the Metz Metz et al. (2011), KIBA Tang et al. (2014), and Davis Davis et al. (2011) datasets serve as the benchmark datasets for regression-based DTI proposals and their output values are measured in dissociation constant , KIBA metric Tang et al. (2014), and inhibition constant , respectively. Another significant feature of the regression-based datasets is that they do not introduce class-imbalance problems seen with the BC datasets mentioned above. The BC-based algorithms typically address the class-imbalance problem using sampling techniques Mahmud et al. (2020) or assume samples without reported interaction information to be non-interacting pairs. We argue that predicting continuous values enable the entire spectrum of interaction to be well-captured in developing DTI prediction models.

Furthermore, since in silico DTI models are typically not replacements for in vitro and in vivo validations, interpretability of their prediction is vital to guiding domain experts to realizing the benefits above of advances in the domain. However, the application of multiple levels of non-linear transformation of the input means that DL models do not lend themselves easily to interpretation. In some studies, less powerful alternatives such as decision trees and

regularization of linear models have been used to achieve the interpretability of prediction results Pliakos et al. (2019); Tabei et al. (2012). Recent progress in pooling and attention-based techniques Bahdanau et al. (2014); dos Santos et al. (2016); Vaswani et al. (2017) have also aided the ability to gain insights into DL-based prediction results Yingkai Gao et al. (2018); Shin et al. (2019). We posit that such attention-based mechanisms offer a route to provide biologically plausible insights into DL-based DTI prediction models while leveraging the strength of DL-models. Also, since attention-based methods can learn rich molecular representations, it could facilitate accurate predictions in other domains such as ligand-catalyst-target reactions Rifaioglu et al. (2018).

To this end, our contributions to the domain are as follows:

  • We propose a multi-view attention-based architecture for learning the representation of compounds and targets from different unimodal descriptor schemes (including end-to-end schemes) for DTI prediction.

  • Our usage of neural attention enables our proposed approach to lend itself to the interpretation and discovery of biologically plausible insights in compound-target interactions across multiple views.

  • We also experiment with several baselines and show how these seemingly different compound and target featurization proposals in the literature could be aggregated to leverage their complementary relationships for modeling DTIs.

The rest of our study is organized as follows: section 2 discusses the related work and baseline models of our study, we discuss the various featurization methods we use and our proposed architecture in section 3. The experiments we conducted are described in section 4 and we discuss the results in section 5. Finally, we conclude our work in section 6.

2 Related Work

In silico methods provide a promising route to tackle some critical challenges in drug discovery effectively. Over the last decade, several studies have been conducted in modeling interactions, which has led to substantial progress in DTI prediction and other related tasks. We review some of these notable works which relate to our study in what follows.

One of the seminal works on integrating unimodal representations of drugs and compounds is Yamanishi et al. (2010). The authors note that the challenges with DTI prediction mean that the development of models that can leverage heterogeneous data is vital to the domain. Hence, the chemical space, genomic space, and pharmacological space are integrated. Subsequently, the compound-target pairwise relationships are studied using network or graph analysis. Shi et al Shi et al. (2015) also augment similarity information with non-structural features to perform DTI prediction using a network-based approach.

Additionally, Luo et al. Luo et al. (2017) argue that multi-view representations enable modeling of bioactivities using diverse information. As a result, a DTI model is proposed in Luo et al. (2017) that learn the contextual and topological properties of drug, disease, and target networks. Likewise, Wang et al. Wang et al. (2018)

also propose a random forest-based DTI prediction model that integrates features from drug, disease, target, and side-effect networks learned using GraRep 

Cao et al. (2015). These network-based DTI models are not scalable to large datasets and unable used on samples outside the dataset.

Also, other researchers have adopted collaborative filtering methods to predict DTIs. In Liu et al. (2016)

, the authors propose a Matrix Factorization (MF) method for predicting the probability that a compound would interact with a given target. Noting that traditional MF methods are unable to detect nonlinear properties, a deep MF (DMF) method is proposed in 

Manoochehri and Nourani (2018)

. The DMF approach first constructs negative samples using a K-Nearest Neighbor (kNN) method and then builds an interaction matrix. The rows and columns of the interaction matrix then serve as the features of drugs and targets in a DL model, which finds the low-rank decomposition of the interaction matrix.

Similarly, Yasuo et al. Yasuo et al. (2018) use a probabilistic MF approach to decompose an interaction matrix into a target-feature matrix and a feature-ligand matrix. While these DL-based MF are able to learn nonlinear properties, viewing DTI prediction as a BC problem, as seen in these works, does not address the entire spectrum of bioactivity. In Nair and George (2018), the graph-regularized MF approach of Ezzat et al. (2017) is also extended to a multi-view approach that integrates both chemical and structural views of compounds and targets. As mentioned earlier, in the BC setting, true-negatives are mostly lacking and using kNN, as in Manoochehri and Nourani (2018), introduces arbitrariness in determining negative samples.

On the other hand, similarity-based ML methods have also been proposed for DTI prediction. In this setting, compound and target similarity matrices are constructed and used in kernel-based algorithms such as Support Vector Machines (SVM) 

Jacob and Vert (2008); Bleakley and Yamanishi (2009), and other well-known ML algorithms such as kNN and Regularized Least Squares (RLS). While compound similarities are typically constructed by considering their topological and chemical properties Öztürk et al. (2016), target similarities are usually computed using metrics such as the Smith-Watermann (SW) score, which considers the alignment between sequences Ding et al. (2013). Nonetheless, these approaches use the BC problem formulation. Conversely, the work in Pahikkala et al. (2015b) proposed a Kronecker RLS (KronRLS) method that predicts binding affinity measured in and .

Concerning ensemble ML algorithms, SimBoost is proposed in He et al. (2017) as a GBT-based DTI prediction model. While KronRLS is a linear model, SimBoost can learn non-linear properties for predicting real-valued binding affinities. While He et al. (2017) uses a feature-engineering step to select compound-target features for GBT training, the work in Mahmud et al. (2020) integrates different representations of a target and uses a feature-selection algorithm to construct representations for GBT training. The work in Rayhan et al. (2019) also proposes a feature-selection method for determining feature-subspaces for GBT training. Additionally, Orellana M et al. (2018) proposes an AdaBoost model for DTI prediction. However, as noted in Niculescu-Mizil and Caruana (2005), Boosting methods are not well-suited for predicting probabilities.

In another vein, several DL methods have been proposed to learn the features of compounds and targets for DTI prediction Wu et al. (2018); Wallach et al. (2015); Gomes et al. (2017); Kearnes et al. (2016), whereas others have proposed DL models that take predefined features as inputs. The work in Wen et al. (2017)

proposed a deep-belief network to model interactions using ECFP and Protein Sequence Composition (PSC) of compounds and targets, respectively. 

Yang et al. (2018) also propose a DTI model that uses generative modeling to oversample the minority class in order to address the class imbalance problem. In Lee et al. (2019)

, the sequence of a target is processed using a Convolutional Neural Network (CNN), whereas a compound is represented using its structural fingerprint. The compound and target feature vectors are concatenated and serve as input to a fully connected DL model. Using CNN means the temporal structure in the target sequence is sacrificed to capture local residue information.

In contrast, Yingkai Gao et al. (2018)

used a Recurrent Neural Network (RNN) and Molecular Graph Convolution (MGC) to learn the representations of targets and compounds, respectively. These representations are then processed by a siamese network to predict interactions. A limitation of the approach in 

Yingkai Gao et al. (2018) is that extending it to multi-task networks require training several siamese models. While all these works formulate the DTI prediction as a BC problem, Öztürk et al. (2018) proposes a DL model that predicts binding affinities given compound and protein encoding. The work in Shin et al. (2019) also proposed a self-attention based DL model that predicts binding affinities. Using self-attention enables atom-atom relationships in a molecule to be adequately captured. Nonetheless, these studies do not leverage other unimodal representations of compounds and targets. Also, they do not adopt the split schemes proposed in Pahikkala et al. (2015b) for developing chemogenomic models.

In what follows, we provide an introduction to the existing regression ML models for DTI prediction that are used as baselines in this study for completeness.

2.1 KronRLS

The KronRLS method proposed in Pahikkala et al. (2015b) is a generalization of the RLS method in which the data is assumed to consist of pairs (compounds and targets, in this case). It is a kernel-based approach for predicting the binding affinity between a compound-target pair. Specifically, given a set of compound-target pairs as training data with their corresponding binding-affinity values , where and , KronRLS learns a real-valued function that minimizes the objective,


where is a regularization parameter and is the norm of the minimizer associated with the kernel in equation 2. Basing on the representer theorem, Pahikkala et al. (2015b) defines the minimizer as,


where the kernel function is a symmetric similarity measure between two compound-target pairs. Given a dataset of samples, can be represented as computed as if contains all possible compound-target pairs. Here, and are the kernel matrices of the compounds and targets, respectively. In this context, the parameters of can be determined in closed form by solving a system of linear equations:


where is the set of compounds, is the set of targets, , , and

is an identity matrix. Equation 

3 assumes that contains the binding affinities of all pairs in order to be solved in closed form. In cases where this assumption does not hold, conjugate gradient with Kronecker algebraic optimization could be employed to determine

. Howbeit, other imputation strategies have been employed to maintain the closed-form evaluation of equation 

3 Pahikkala et al. (2015b).

2.2 SimBoost

SimBoost, proposed in He et al. (2017)

, is a gradient boosting approach to predict the binding affinity between a compound and a target. The authors propose three types of features to construct the feature vector of a given compound-target in training set


  1. Type 1: features for each compound and target based on average similarity values, and information about their frequency in the dataset.

  2. Type 2: features for entities determined from their respective similarity matrices.

  3. Type 3: features for each compound-target pair computed using a compound-target network.

Given compound and target , the feature vector of the pair is constructed by concatenating the type 1 and type 2 features of both and , and the type 3 features of the pair . The corresponding binding affinity of is computed as,


where is the space of all possible trees and is the number of regression trees. Using the additive ensemble training approach, the set of trees are learned by minimizing the following regularized objective:


where determines model complexity to control overfitting,

is a differentiable loss function which evaluates the prediction error and

is the true binding affinity corresponding to .

2.3 Padme

In Feng et al. (2018), PADME is proposed to model DTIs. The authors propose two variants of PADME: PADME-ECFP and PADME-GraphConv. The former variant constructs feature vectors of compounds using the ECFP scheme, whereas the latter learns the representations of compounds using Molecular Graph Convolution Altae-Tran et al. (2017). On the other hand, targets are represented using PSC Cao et al. (2013). After that, for a given compound-target pair, the feature vector is constructed as the concatenation of the compound and target feature vectors. This constructed feature vector then becomes an input to a Fully Connected Neural Network (FCNN) which minimizes the regularized Mean Square Error (MSE) objective,


where outputs as the predicted value using parameters and is a regularization parameter to control overfitting.

2.4 Ivpgan

In our previous study Agyemang et al. (2019), we propose IVPGAN to predict DTIs using a multi-view approach to represent a compound and PSC to construct the target feature vector. While ECFP is used to represent predefined compound features, MGC is used to learn the representation of a compound given the graphical structure encoded in its SMILES notation. Using an Adversarial Loss (AL) training technique, the following objective is minimized:




, and are trainable parameters, , is a concatenation operator, is a norm operator,

is a hyperparameter that is used to control the combination of MSE and the AL objectives, and

is a regularization parameter that controls overfitting. is the MSE objective of the DTI prediction model, which is treated as the generator of a Generative Adversarial Network (GAN). is the generator objective component of the GAN whose discriminator objective is expressed as,


where the distributions and of equations 9 and 10 are derived from the neighborhood alignment matrices constructed from the labels and predicted values, respectively, as explained in Agyemang et al. (2019).

3 Methods

3.1 Problem Formulation

We consider the problem of predicting a real-valued binding affinity between a given compound and target , . The compound takes the form of a SMILES Weininger (1988) string, whereas the target is encoded as an amino acid sequence. The SMILES string of is an encoding of a chemical graph structure , where is the set of atoms constituting and is a set of undirected chemical bonds between these atoms. Therefore, each data point in the training set is a tuple . In this study, we refer to the SMILES of a compound and the amino acid sequence of a target as the ‘raw’ form of these entities, respectively.

In order to use the compounds and targets in VS models, their respective raw forms have to be quantized to reflect their inherent physicochemical properties. Accurately representing such properties is vital to reducing the generalization error of VS models Rifaioglu et al. (2018). We discuss the featurization methods considered in our study in sections 3.2 and 3.3.

3.2 Compound Featurization

3.2.1 Extended Connectivity Fingerprint

The ECFP algorithm is a state-of-the-art circular fingerprint scheme for numerically encoding the topological features of a compound Rogers and Hahn (2010). ECFP decomposes a compound into substructures and assigns a unique identifier to each fragment. In the algorithm, larger substructures are composed through bond relations. A diameter parameter controls the extent to which these larger substructures can be composed. For instance, with a diameter of 4, (written as ECFP4), the largest substructure has a width of 4 bonds. Subsequently, the unique identifiers of all fragments are hashed to produce a fixed-length binary vector. This final representation indicates the presence of particular substructures. We use RDKit’s Landrum (2006) implementation of the ECFP algorithm in our study.

3.2.2 Molecular Graph Convolution

Motivated by recent progress in end-to-end representation learning, MGC is a class of algorithms that, for a given layer, apply the same differentiable function to the atoms of a molecule to learn the features of the molecule from its raw form. This operation is akin to the use of kernels in the CNN architecture. Also, information about distant atoms is propagated radially through bonds, as found in circular fingerprints. Thus, composing several layers facilitate the learning of useful representations that are related to the learning objective. The earliest form of MGC is the work in Duvenaud et al. (2015). It has been used in a notable number of studies and in various forms, such as that of Yingkai Gao et al. (2018), to model bioactivity. In Altae-Tran et al. (2017), graph pool, and gather operations are proposed to augment the neural graph fingerprints algorithm of Duvenaud et al. (2015). Recent progress in the domain has also produced other forms of MGCs Wu et al. (2018). In our study, we use the GraphConv algorithm proposed by Altae-Tran et al. (2017). Atom vectors are initialized using predefined physicochemical properties. The main operations of GraphConv are:

  1. Graph convolution: applies molecular graph convolution to each atom.

  2. Graph pool: applies a pooling function to an atom and its neighbors to get the updated feature vector of the atom.

  3. Graph gather: takes the feature vectors of all atoms and applies a downsampling function to compute the fixed-length compound feature vector .

We refer to the GraphConv implementation without the graph gather operation as GraphConv2D in this study. Hence, for a compound of atoms, where is the vector the th atom, the output of GraphConv2D is .

3.2.3 Weave

Weave featurization, proposed in Kearnes et al. (2016), is another form of MGC. In the weave algorithm, atom-atom pairs are constructed using all atoms in a molecule. The features of an atom are then updated using the information of all other atoms and their respective pairs. This form of update enables the propagation of information from distant atoms, albeit with increased complexity. While predefined physicochemical features are used to initialize atom vectors, topological properties are used to initialize atom-atom pair vectors. The following are the main operations of the weave featurization scheme:

  1. Weave: applies the weave operation as described above.

  2. Weave gather: computes the compound feature vector as a function of all atom feature vectors.

We refer to the Weave implementation without the graph gather operation as Weave2D in this study. Thus, for a compound of atoms, where is the vector the th atom, the output of Weave2D is .

3.2.4 Graph Neural Network

In Tsubaki et al. (2019), a Graph Neural Network (GNN) is proposed for molecular graphs. GNN maps a given molecular graph to a fixed-length feature vector using two differentiable functions: transition and output functions. Atoms are depicted as nodes, and the bonds within a molecule form the edges in the molecular graph. For each entity in the graph, substructures within a specified radius are encoded to form the embedding profile of the entity. These profiles are then mapped to indices of an embedding matrix that is trained using backpropagation. The transition function is used to update the features of atoms and bonds towards determining the vector representation of the molecule. Thus, applying different transition functions hierarchically recapitulates the convolution operation in a CNN since the same transition function is applied to all entities in the graph in a layer. The output function downsamples the set of node vectors from the transition phase to get the fixed-length molecular representation, .

In our study, we use a variant of GNN dubbed GNN2D. This variant omits the downsampling phase of the GNN operation. Thus, for a compound of atoms, where is the vector the th atom, the output of GNN2D is .

3.3 Target Featurization

3.3.1 Protein Sequence Composition

As regards target quantization, PSC is a well-known predefined scheme for capturing subsequence information. It consists of Amino Acid Composition (AAC), Dipeptide Composition (DC), and Tripeptide Composition (TC). AAC provides information about the frequency of each amino acid. DC determines the frequency of every two amino acid combinations, whereas TC computes the frequency of every three amino acid combinations. The dimension of a PSC feature vector is 8420.

3.3.2 Prot2Vec

Similar to compound featurization, efforts have been made to learn protein representations directly from their raw forms. Learning protein vectors is typically achieved by learning embedding vectors using NLP techniques such as the word2vec and GloVe models Mikolov et al. (2013); Pennington et al. (2014). This approach also maintains the temporal properties in the target sequence. In Asgari and Mofrad (2015)

, it is shown that the NLP approach could be used to develop rich target representations. Therefore, we construct a vocabulary of n-gram subsequences (biological words) following the splitting scheme of 

Asgari and Mofrad (2015). We set in this study. In Figure 1, the approach we use to construct the 3-gram profile of a protein sequence is illustrated. The raw form of the protein is split into three non-overlapping representations. The words of all three sequences make up the vocabulary used in this study. We then move across the three splits to construct the overlapping 3-gram target profile. Each word in the dictionary is mapped to a randomly initialized vector , , that is updated during training.

In order to make computations tractable, we group subsequences using a non-overlapping window approach similar to the method in Tsubaki et al. (2019).

Specifically, given the target profile , we retrieve the vectors of each word to construct the set of vectors . Setting the window size to 3, for didactic purposes, we group as:

where is a concatenation operator. Also, denotes the window where is the window size. Note that if by elements, we add to the window times. Here, is a vector of all zeros. Thus, each window is a -dimensional vector. Pooling functions or RNN could then be used to process these windows/segments into a fixed-length representation of the target. In section 3.4 we show how we use our proposed approach to construct the fixed-length vector of a target.

Figure 1: Target sequence 3-gram representation. The original target sequence is split into three non-overlapping sequences (split 1, split 2, and split 3). The overlapping 3-gram profile of the protein is constructed by moving across the three sequences as depicted by the arrow.

3.3.3 Protein Convolutional Neural Network

Protein Convolutional Neural Network (PCNN), proposed by Tsubaki et al. Tsubaki et al. (2019), is another end-to-end representation learning scheme for target sequences. It uses a similar approach to the Prot2Vec method (see section 3.3.2), but with overlapping windows, to construct target representations. The subsequent discussion on the PCNN uses Prot2Vec to encode target data and also has a minor variation of the convolution operation in Tsubaki et al. (2019). Given to be the th window of , where denotes the th convolution layer, PCNN computes of as,



is a nonlinear activation function, we let

be the kernel, and . Applying equation 11 multiple times enable nonlinear properties to be learned at different levels of abstraction. In order to produce a -dimensional vector for the last PCNN layer , we let and . Thus, the final output is . We refer to the rows of as segments.

To compute the vector representation of the target, Tsubaki et al. (2019) propose using the average pooling function. It is easy to realize that other differential pooling functions, such as the max and sum functions, could be employed. Moreover, an attention mechanism is proposed in Tsubaki et al. (2019), where the compound representation is used to compute attention weights for the segments of the target representation. In this context, the compound vector dimension and the segment dimension must be equal. In this study, we refer to the attention variant as PCNN with Attention (PCNNA). We refer the reader to Tsubaki et al. (2019) for the exposition of PCNNA.

Additionally, we use a variant of the PCNN architecture called PCNN2D. This variant omits the downsampling and attention phases of the PCNN method.

3.4 Joint View Attention for DTI prediction

We propose a Joint View self-Attention (JoVA) approach to learn rich representations from different unimodal representations of compounds and targets for modeling bioactivity. Such a technique is significant when one considers that there exist several molecular representations, and that other novel methods are likely to be proposed, in the domain.

In Figure 2, we present our proposed DL architecture for predicting binding affinities between compounds and targets. Before discussing the details of the architecture, we explain the terminology it uses:

  • Entity: this refers to a compound or target.

  • View: this refers to a unimodal representation of an entity.

  • Segment: for an entity represented as , we refer to the rows as the segments.

  • Projector: projects an entity representation into , where is the latent space dimension.

  • Concatenation function: We denote the concatenation (concat) function as .

  • Combined Input Vector (CIV): a vector that is constructed by concatenating two or more vectors and used as the input of a function.

For a set of views , JoVA represents of an entity as where denotes the number of elements that compose the entity and is the dimension of the feature vector of each of these elements of the -th view. We write as in subsequent discussions to simplify notation. For a compound, the segments are the atoms, whereas a window of n-gram subsequences is a segment of a target. Note that in the case where the result of an entity featurization is a vector before applying the JoVA method (e.g., ECFP and PSC), this is seen as . Thus, .

Thereafter, a projection function of projects into a latent space of dimension to get . Note that the dimension of each projection function is . We refer to this operation as the latent dimension projection. We use the format (seg. denotes segment(s)),

Ψ(No. of seg., No. of samples, seg. dimension)

to organize

samples at this stage, employing zero-padding where necessary due to possible variation in the number of segments in a batch. This data structure follows the usual NLP tensor representation format, where the number of segments is referred to as sequence length. Hence, the output of

for a single entity is written as . This enables the concatenation of all projected representations to form the joint representation , where K is computed as,


then serves as the input to the joint view attention module. Since we use a single data point in our discussion, we use in subsequent discussions.

Figure 3 illustrates the detailed processes between the segment-wise concat and view-wise concat layers of Figure 2. Given the multi-view representation of an entity , we apply a multihead self-attention mechanism and segment-wise input transformation Vaswani et al. (2017). An attention mechanism could be thought of as determining the relationships between a query and a set of key-value pairs to compute an output. Here, the query, keys, values, and outputs are vectors. Therefore, given a matrix of queries , a matrix of keys , and a matrix of values , the output of the attention function is expressed as,


where is the dimension of . In self-attention, we set as , , and . The use of as query, key, and value enables different unimodal segments to be related to all other views to compute the final representation of the compound-target pair. Thus, each view becomes aware of all other views in learning its representation. This method addresses the challenge of extending the two-way attention mechanism Yingkai Gao et al. (2018) to multiple unimodal representations. A single computation of equation 13 is referred to as a ‘head’.

In order to learn a rich representation of a compound-target pair, is linearly projected into different subspaces, and the attention representation of each projection is computed after that. The resulting attention outputs are concatenated and also linear projected to compute the output of the multihead sub-layer. For a set of self-attention heads , the multihead function is expressed as,


where , , , , is the dimension of , and .

Additionally, a segment-wise transformation sub-layer is used to transform each segment of the multihead attention sub-layer output non-linearly. Specifically, we compute


where denotes the -th segment, , . We set in this study, same as found in Vaswani et al. (2017).

Furthermore, the Add and Norm layers in Figure 3

implements a residual connection around the multihead and segment-wise transformation sublayers. This is expressed as


Figure 2: Joint View Attention(JoVA) for Drug-Target Interaction Prediction
Figure 3: Architecture for constructing of the Combined Input Vector (CIV) using self-attention and pooling given the set of projected unimodal representations.

At the segments splitter layer, is split into the constituting view representations . Note that for a single sample. To construct the final vector representation out of , pooling functions could then be applied to each view’s representation. This enables our approach to be independent of the number of segments of each view, which could vary among samples. In this study, is computed as,


where and denotes the -th segment of . The view-wise concat layer subsequently computes the final representation of the compound-target pair as the concatenation of to get . We refer to as the Combined Input Vector (CIV). The CIV therefore becomes the input to a prediction model. In our implementation of JoVA, the prediction model is a FCNN with 2-3 hidden layers.

4 Experiments Design

In this section, we present the details of the experiments used to evaluate our proposed approach for DTI prediction.

4.1 Datasets

The benchmark datasets used in this study are the Metz Metz et al. (2011), KIBA Tang et al. (2014), and Davis Davis et al. (2011) datasets. These are Kinase datasets that have been applied to benchmark previous DTI studies using the regression problem formulation Pahikkala et al. (2015b); Öztürk et al. (2016); He et al. (2017); Feng et al. (2018); Shin et al. (2019). Members of the Kinase family of proteins play active roles in cancer, cardiovascular, and other inflammatory diseases. However, their similarity makes it challenging to discriminate within the family. This similarity results in target promiscuity problems for binding ligands and, as a result, presents a challenging prediction task for ML models Pahikkala et al. (2015b). We use the version of these datasets curated by Feng et al. (2018). In Feng et al. (2018), a filter threshold is applied to each dataset for which compounds and targets with a total number of samples not above the threshold are removed. We maintain these thresholds in our study. The summary of these datasets, after filtering, is presented in table 1. Figure 4 shows the distribution of the binding affinities for the datasets.

Figure 4: Distribution of the binding affinities (labels) in the Davis, Metz, and KIBA datasets used in our experiments.
Figure 5: Structure of each fold in the CV scheme used.

4.2 Baselines

In line with the multi-view representation learning espoused by this study, we use the following compound and target views listed in Table 3.

We compare our proposed approach to the works in Pahikkala et al. (2015b); He et al. (2017); Feng et al. (2018); Tsubaki et al. (2019). While Tsubaki et al. (2019) is a binary classification model, we replace the endpoint with a regression layer in our experiments. The labels we give to Pahikkala et al. (2015b); He et al. (2017) and Tsubaki et al. (2019)

are KronRLS, SimBoost, and CPI, respectively. SimBoost and KronRLS are implemented as XGBoost and Numpy models, respectively, in our experiments.

As discussed in section 2.3, two DL models were proposed for DTI: (1) PADME-ECFP4 and (2) PADME-GraphConv. Here, we consider these two architectures under a bigger umbrella of models that use a single view of a compound and a single view of a target. The nomenclature of such models is compound view-target view.

In summary, the list of baselines used in this study are presented in Table 4.

Number of
of targets
Total number
of pair samples
Davis 72 442 31824 6
Metz 1423 170 35259 1
KIBA 3807 408 160296 6
Table 1: Dataset sizes
Model # Cores
# GPUs
Intel Xeon
CPU E5-2687W
48 128
1 GeForce
GTX 1080
Intel Xeon
CPU E5-2687W
24 128
4 GeForce
GTX 1080Ti
Table 2: Simulation hardware specifications
View Entity Remark
ECFP8 Compound See section 3.2.1
GraphConv Compound See section 3.2.2
Weave Compound See section 3.2.3
GNN Compoud See section 3.2.4
PSC Target See section 3.3.1
RNN Target
Uses an RNN based on the Prot2Vec
target data organization
PCNN Target See section 3.3.3
Table 3: Compound and target views used in the experiments
GraphConv-PSC ECFP8-RNN SimBoost
ECFP8-PCNNA GraphConv-RNN IntView (Integrated View)
Table 4: Baselines used in experiments
Compound Views Target View(s)
ECFP8, GraphConv PSC
ECFP8, GraphConv PCNN2D
ECFP8, Weave PSC
ECFP8, GraphConv RNN, PSC
Table 5: JoVA variants used in experiments

4.3 JoVA Models

In order to show the versatility of JoVA, we propose six variants using combinations of the views listed in Table 3. However, other representations not considered herein could be utilized. The primary condition is ensuring that a view’s representation of an entity, before the joint view attention module of Figure 2, is in a matrix form. Indeed, that is the rationale for the 2D variants of the GraphConv, GNN, Weave, and PCNN models. Nonetheless, as earlier mentioned, the feature vector representations could be treated as a one-row matrix in order to make the JoVA computations possible. The six variants are shown in Table 5

, and they are implemented as Pytorch models herein.

4.4 Model Training and Evaluation

In our experiments, we used a 5-fold Cross-Validation (CV) model training approach. The structure of each CV-fold is shown in Figure 5. Also, the following three main splitting schemes were used:

  • Warm split: Every drug or target in the validation and test sets is encountered in the training set.

  • Cold-drug split: Every compound in the validation and test sets is absent from the training set.

  • Cold-target split: Every target in the validation and test set is absent from the training set.

Since cold-start predictions are typically found in DTI use cases, the cold splits provide realistic and more challenging evaluation schemes for the models.

We used Soek222, a Python module based on scikit-learn, to determine the best performing hyperparameters for each of all models. We used the warm split of the Davis dataset and the validation set of each fold for the search. The determined hyperparameters were then kept fixed for all split schemes and datasets. This was done due to the enormous time and resource requirements needed to repeat the search in each case of the experiment. The only exception to this approach is the Simboost model where we searched. In the case of Simboost, we searched for the best performing latent dimension of the matrix factorization stage for each dataset. The test set of each fold was used to evaluate trained models.

As regards evaluation metrics, we measure the Root Mean Squared Error (RMSE) and Pearson correlation coefficient (

) on the test set in each CV-fold. Additionally, we measure the Concordance Index (CI) on the test set, as proposed by Pahikkala et al. (2015b).

We follow the averaging CV approach, where the reported metrics are the averages across the different folds. We also repeat the CV evaluation for different random seeds to minimize randomness. After that, all metrics are averaged across the random seeds.

5 Results and Discussion

In this section, we discuss the results of all baseline and JoVA models of our study. Here, performance is to be understood as referring to the CI, RMSE, and results of a given model. While the smaller RMSE value is desirable when comparing two models, larger values of CI and connotes the best performance.

In Figure 6, we present the performances of both the baseline and JoVA models on the Davis dataset. Generally, the cold drug split proved to be the most challenging scheme on the Davis dataset, with the cold target and warm splits following in that order. This trend on the Davis dataset implies that the entity with fewer samples may offer the toughest challenge in the cold splitting schemes of Pahikkala et al. (2015b).

We realized that the models that utilized multiple unimodal representations of entities usually resulted in the best or competitive performance on the RMSE, CI, and metrics. In particular, the IntView and IVPGAN models performed best amongst all the models, with the IntView model attaining a marginal increase in performance than the IVPGAN model. While the IVPGAN results observed in this study is an improvement on the work in Agyemang et al. (2019). Nonetheless, the ECFP8-PSC model performed almost as well as the best performing multi-view methods. We argue that the simplicity (in terms of the number of trainable parameters) of the ECFP8-PSC model makes it suitable to perform well on the Davis dataset. Thus, we reckon that the susceptibility of the CPI, ECFP8-PCNNA, GNN-RNN, GraphConv-PCNNA/RNN, and Weave-RNN models to overfitting accounts for their respective gap in performance, given the size of the Davis dataset.

Also, while the CPI (GNN-PCNNA) model performed poorly, the GNN-PSC model attained competitive performance, especially on the warm and cold target splits. Thus, we show that the GNN method proposed in Tsubaki et al. (2019) could be paired with other target representations, other than PCNNA, to improve performance. Interestingly, the richness of the PSC representation is seen in the inability of the PCNNA and RNN baseline models to perform well on the Davis dataset. An instance of this phenomenon is seen in the Weave-PSC, and Weave-RNN reported results. While this presents a counter-intuitive observation, we posit that on more massive datasets, such end-to-end target representation methods could, at the minimum, produce comparative results to models that use prefined features.

As regards the traditional ML models, KronRLS recorded modest results for a linear model, whereas SimBoost achieved results comparable to that of the multi-view baseline models. While the performance of GBT is well documented in the literature, we note that our approach to determining the MF latent dimension also contributes to the improvement in the results since He et al. (2017) shows the significance of the MF features to model predictions.

On the other hand, the ECFP8-GraphConv-RNN-PSC and ECFP8-GNN-PSC models demonstrate the effectiveness of the proposed attention-based approach for integrating multiple unimodal representations of entities for DTI. Similar to the DL baseline models, more complex JoVA models performed somewhat poorly, albeit less so in juxtaposition with their baseline analogs. We argue that this is due to the attention mechanism’s ability to actively encourage learning representations that are highly related to the learning objective. While the IVPGAN and IntView models attained the best performance on the Davis dataset, the best performing JoVA models offer the ability to interpret prediction results via examining the attention weights, aside from the high prediction performance. Additionally, the reported results of the best JoVA models seem to imply that the attention-based multi-view representation learning approach reduces the challenge of the cold splitting schemes. For instance, comparing the results of the GNN-PSC and ECFP8-GNN-PSC models emphasizes the ability of JoVA on the cold target scheme. Thus, our proposed method of modeling bioactivity attains respectable results on the Davis benchmark dataset.

The performance of the baseline and JoVA models on the Metz datasets are shown in Figure 7. Similar to the general trend of difficulty seen on the Davis dataset, the cold target regime proved to be the most challenging since the Metz dataset set has fewer targets (see Table 1). This phenomenon is more evident among the baseline models than the JoVA models.

Furthermore, the DL-based baselines mostly performed poorly on the Metz dataset. In particular, the GraphConv-RNN/PCNNA and CPI models attained performances similar to the KronRLS model. This observation connotes that massive bioactivity datasets are required in the domain for training unimodal end-to-end DTI models in order to learn abstract nonlinear patterns from samples properly. It is noteworthy that while SimBoost edged the multi-view models to become the best performing model, the nature of SimBoost’s feature engineering phase renders it inapplicable to the cold splitting schemes. Additionally, while all other baselines and JoVA models maintain the hyperparameters identified using the warm splitting scheme of the Davis dataset, the MF phase of SimBoost uses hyperparameters explicitly identified for the Metz dataset.

We also observe from Figure 6(b) that the results of the JoVA models on the Metz dataset consistently follow their respective results on the Davis dataset. We argue that this behavior is a direct result of the attention-based multi-view representation learning approach proposed in this study. Here, the ECFP8-GNN-PSC, ECFP8-GraphConv-PSC, and ECFP8-GraphConv-RNN-PSC models recorded the best results. An interesting highlight is how the JoVA models’ performances are almost invariable in all three splitting schemes as compared to the variations seen among the baselines. For instance, comparing the ECFP8-GNN-PSC JoVA model to the GNN-PSC, ECFP8-PSC, IVPGAN baselines reify this phenomenon.

Likewise, comparing the CPI (GNN-PCNNA) baseline to the ECFP8-GNN-PCNN2D-PSC JoVA model gives another perspective into the strengths of our proposed approach. ECFP8-GNN-PCNN2D-PSC, as could be deduced from our earlier discussions, uses the GNN and PCNN modules of the CPI architecture. However, while CPI performs poorly on the Metz dataset, the joint view attention mechanism leverages ECFP8 and PSC to cause better results in the ECFP8-GNN-PCNN2D-PSC model.

Figure 6: Davis dataset results. (a) Performance of baseline models. (b) Performance of JoVA models.
Figure 7: Metz dataset results. (a) Performance of baseline models. (b) Performance of JoVA models.
Figure 8: KIBA dataset results. (a) Performance of baseline models. (b) Performance of JoVA models.
Figure 9: ECFP8-GraphConv-RNN-PSC results on the Davis dataset.

(From right to left) Column 1 shows the scatter plots of the ground truth (red line) against predicted values (blue dots). Column 2 shows the joint distribution plots of the ground truth against predicted values. The first, second, and third rows correspond to the warm, cold drug, and cold target splits, respectively.

Figure 10: ECFP8-GraphConv-RNN-PSC results on the Metz dataset. (From right to left) Column 1 shows the scatter plots of the ground truth (red line) against predicted values (blue dots). Column 2 shows the joint distribution plots of the ground truth against predicted values. The first, second, and third rows correspond to the warm, cold drug, and cold target splits, respectively.
Figure 11: ECFP8-GraphConv-RNN-PSC results on the KIBA dataset. (From right to left) Column 1 shows the scatter plots of the ground truth (red line) against predicted values (blue dots). Column 2 shows the joint distribution plots of the ground truth against predicted values. The first, second, and third rows correspond to the warm, cold drug, and cold target splits, respectively.

On the KIBA dataset, while most of the baselines had varied in their performances (see Figure 8), the JoVA models performed similarly to the previous experiments. This demonstrates the consistency of our approach across different datasets. In particular, it can be seen that the ECFP8-GraphConv-RNN-PSC performed just as well as recorded on the Metz and Davis datasets. Additionally, in Figures 9-11 also present the plots of the ECFP8-GraphConv-RNN-PSC model on the three datasets used in this study. The foregoing performance consistency claim on all three CV splits also agrees with the scatter and joint plots shown in these figures.

Taken together, we believe that using self-attention to align multiple unimodal representations of atoms and amino acid residues to each other enables a better representational capacity, as is typical of most neural attention-based DL models.

Figure 12: Epidermal Growth Factor Receptor (EGFR-1M17) tyrosine kinase domain in complex with (a) Brigatinib and (b) Zanubrutinib. The amino acid residues in yellow represent the top-10 subsequences predicted by the JoVA model. For both complexes, the corresponding interaction analysis of the ligand in the binding pocket of EGFR-1M17 is shown on the right. The top-10 atoms of ligand predicted by the JoVA model to be influential in the interaction are depicted in transparent red circles. The amino acids shown in the interaction analysis and also among the top-10 residues in each complex are highlighted using red circles as borders.

5.1 DrugBank Case Study

In this section, we discuss a case study performed using the Drugbank Wishart et al. (2018) database. The ECFP8-GraphConv-RNN-PSC model trained on the KIBA dataset using the warm split scheme was selected to evaluate the ability of our approach to predict novel and existing interactions.

The human Epidermal Growth Factor Receptor (EGFR) was selected to be the target for the case study. While other targets could equally be chosen, EGFR was selected since it is implicated in breast cancer and is a popular target for cancer therapeutics. As regards this Drugbank case study, we refer to both the approved and investigational drug relations of EGFR as interactions.

We downloaded compounds from the Drugbank database containing interaction records for EGFR. Since the Drugbank database contains small and biological molecules, we filtered out all biologics. The filtered dataset contained small molecules, of which are reported to target EGFR. Also, we removed all compounds that are present in the KIBA dataset to ensure that all drugs used for the case study were not part of the training set. As a result, the size of the final Drugbank dataset used for this case study was , with EGFR interactions. Thus, of the small molecules in the Drugbank database are also present in the KIBA dataset.

In table 6 we present the top-50 predictions of the JoVA model. The model was able to have of the EGFR interactions in its first drugs, ranked according to the KIBA score. Also, it can be seen that the predicted KIBA scores for all the reported drugs fall under the KIBA value threshold used in Pahikkala et al. (2015a) to indicate true interactions. Using the unfiltered small molecules, the predicted KIBA values of of the EGFR interactions were all below the threshold mentioned above, with the remaining falling under .

While these results demonstrate the ability of our proposed approach to improve the virtual screening stage of drug discovery, the novel predictions reported herein could become possible cancer therapeutics upon further investigations.

Rank Drugbank ID Drug
1 DB11963 Dacomitinib 1.314
2 DB06021 AV-412 1.516
3 DB07788 1.693
4 DB12818 NM-3 1.775
5 DB14944 Tarloxotinib 1.834
6 DB02848 1.901
7 DB05944 Varlitinib 1.912
8 DB12669 4SC-203 1.915
9 DB12114 Poziotinib 1.993
10 DB14993 Pyrotinib 1.997
11 DB06346 Fiboflapon 2.172
12 DB11652 Tucatinib 2.301
13 DB01933 7-Hydroxystaurosporine 2.414
14 DB12381 Merestinib 2.423
15 DB07270 2.467
16 DB06469 Lestaurtinib 2.534
17 DB13517 Angiotensinamide 2.582
18 DB09027 Ledipasvir 2.591
19 DB11747 Barasertib 2.645
20 DB12668 Metenkefalin 2.654
21 DB03482 2.692
22 DB11613 Velpatasvir 2.693
23 DB07321 2.706
24 DB03005 2.708
25 DB13088 AZD-0424 2.708
26 DB12673 ATX-914 2.712
27 DB12267 Brigatinib 2.717
28 DB11973 Tesevatinib 2.721
29 DB12706 Seletalisib 2.724
30 DB15343 HM-43239 2.755
31 DB12183 Sapitinib 2.764
32 DB15035 Zanubrutinib 2.764
33 DB15168 Cilofexor 2.772
34 DB06915 2.777
35 DB11853 Relugolix 2.778
36 DB15407 Acalisib 2.797
37 DB05038 Anatibant 2.821
38 DB14795 AZD-3759 2.837
39 DB06638 Quarfloxin 2.837
40 DB01763 2.857
41 DB15403 Ziritaxestat 2.859
42 DB12557 FK-614 2.859
43 DB07838 2.864
44 DB11764 Spebrutinib 2.866
45 DB07698 2.869
46 DB13164 Olmutinib 2.879
47 DB12064 BMS-777607 2.912
48 DB09183 Dasabuvir 2.914
49 DB06666 Lixivaptan 2.934
50 DB06734 Bafilomycin B1 2.937
Table 6: The top 50 drugs predicted to interact with the Epidermal Growth Factor Receptor by the ECFP8-GraphConv-RNN-PSC JoVA model. Entries in bold print are drugs reported to target EGFR in the Drugbank database. The chemical formula of a drug is used if the name of the drug is long.

5.2 Interpretability Case Study

As mentioned earlier, the interpretability of DTI predictions could facilitate the drug discovery process. Also, being able to interpret an interaction in both the compound and target directions of the complex could reveal abstract intermolecular relationships.

Therefore, we performed an interpretability case study using Brigatinib and Zanubrutinib as the ligands and EGFR (Protein Data Bank ID: 1M17) as the macromolecule in two case studies. The EGFR structure was retrieved from the PDB333 and the ligand structures from the DrugBank for docking experiments. We used PyRx Dallakyan and Olson (2015) to perform in-silico docking and Discovery Studio (v20.1.0) to analyze the docking results. We then mapped the top-10 atoms and top-10 amino acid residues predicted by the JoVA model used in the Drugbank case study above unto the docking results. The attention outputs of the model were used in selecting these top-k segments. In Figure 12, the yellow sections of the macromolecule indicate the top-10 amino acid residues, whereas the top-10 atoms of the ligand are shown in red transparent circles in the interaction analysis results on the right of each complex.

In the case of the EGFR-Brigatinib complex (see Figure 11(a)), we realized that the selected amino acid residues were mostly around the binding pocket of the complex. While we show only the best pose of the ligand in Figure 12, the other selected amino acid residues were identified by the docking results to be for other poses of the ligand. Also, selected atoms of the ligand happen to be either involved in an intermolecular bond or around regions identified by the docking results analysis to be essential for the interaction. Interestingly, the amino acids of the macromolecule identified to be intimately involved in the interaction and also among the top-10 residues are predominantly in a Vand der Waals interaction with the ligand. Thus, the model considered stability of the interaction at the active site to be significant in determining the binding affinity.

Likewise, the EGFR-Zanubrutinib case study yielded interpretable results upon examination. It could be seen in Figure 11(b) that the top-10 amino acid residues selected in the EGFR-Brigatinib case study were identified again. Thus, the model has learned to consistently detect the binding site in both case studies. Indeed, this consistency was also observed in several other experiments using EGFR-1M17 and other ligands444The CSV file containing all the results could be retrieved at This aligns with knowledge in the domain where an active site could be targeted by multiple ligands. The highlighted top-10 amino acid residues also contain three phosphorylation sites (Thr686, Tyr740, Ser744), according to the NetPhos 3.1 Blom et al. (1999)555 server prediction results. Additionally, the interaction analysis of the EGFR-Zanubrutinib case study reveals that a number of the amino acids selected in the top-10 segments are involved in pi-interactions which are vital to protein-ligand recognition. We also note that some of the selected atoms of Zanubrutinib are in the aromatic regions where these pi-interactions take place. In another vein, other selected amino acids are involved in Vand der Waals interactions which reinforce the notion of stability being significant in determining the binding affinity.

In the nutshell, our approach is also able to offer biologically plausible cues to experts for understanding DTI interactions. Such an ability could be invaluable in improving existing virtual screening methods in rational drug discovery.

6 Conclusion

In this study, we have discussed the significance of studying DTI as a regression problem and also highlighted the advantages that lie within leveraging multiple entity representations for DTI prediction. Our experimental results indicate the effectiveness of our proposed self-attention based method in predicting binding affinities and offers biologically plausible interpretations via the examination of the attention outputs. The ability to learn rich representations using the self-attention method could have applications in other cheminformatic and bioinformatic domains such as drug-drug and protein-protein studies.


We would like to thank Siqing Zhang and Chenquan Huang for their help in setting up the experiment platforms. We are also grateful to Orlando Ding, Obed Tettey Nartey, Daniel Addo, and Sandro Amofa for their insightful comments. We thank all reviewers of this study.


  • B. Agyemang, W. Wu, M. Y. Kpiebaareh, and E. Nanor (2019) Drug-target indication prediction by integrating end-to-end learning and fingerprints. In 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, Vol. , pp. 266–272. External Links: Document, Link Cited by: §1, §2.4, §5.
  • H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande (2017) Low Data Drug Discovery with One-Shot Learning. ACS Central Science. External Links: Document, 1611.03199, ISBN 2374-7943, ISSN 23747951 Cited by: §1, §2.3, §3.2.2.
  • E. Asgari and M. R.K. Mofrad (2015)

    Continuous distributed representation of biological sequences for deep proteomics and genomics

    PLoS ONE 10 (11), pp. 1–15. External Links: Document, ISSN 19326203 Cited by: §3.3.2.
  • D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural Machine Translation by Jointly Learning to Align and Translate. pp. 1–15. External Links: 1409.0473, Link Cited by: §1, §1.
  • T. Baltrusaitis, C. Ahuja, and L. P. Morency (2019) Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2), pp. 423–443. External Links: Document, 1705.09406v2, ISBN 0022-5223, ISSN 19393539 Cited by: §1.
  • A. P. Bento, A. Gaulton, A. Hersey, L. J. Bellis, J. Chambers, M. Davies, F. A. Krüger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos, and J. P. Overington (2014) The ChEMBL bioactivity database: An update. Nucleic Acids Research. External Links: Document, ISSN 03051048 Cited by: §1.
  • K. Bleakley and Y. Yamanishi (2009) Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics. External Links: Document, ISSN 13674803 Cited by: §2.
  • N. Blom, S. Gammeltoft, and S. Brunak (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. Journal of Molecular Biology. External Links: Document, ISSN 00222836 Cited by: §5.2.
  • D. S. Cao, Q. S. Xu, and Y. Z. Liang (2013) Propy: A tool to generate various modes of Chou’s PseAAC. Bioinformatics 29 (7), pp. 960–962. External Links: Document, ISSN 13674803 Cited by: §1, §2.3.
  • S. Cao, W. Lu, and Q. Xu (2015) GraRep: Learning graph representations with global structural information. In International Conference on Information and Knowledge Management, Proceedings, External Links: Document, ISBN 9781450337946 Cited by: §2.
  • A. Cereto-Massagué, M. J. Ojeda, C. Valls, M. Mulero, S. Garcia-Vallvé, and G. Pujadas (2015) Molecular fingerprint similarity search in virtual screening. Methods 71, pp. 58 – 63. Note: Virtual Screening External Links: ISSN 1046-2023, Document, Link Cited by: §1.
  • L. Chen and W. Zeng (2013) A Two-step Similarity-based Method for Prediction of Drug’s Target Group. Protein & Peptide Letters. External Links: Document, ISSN 09298665 Cited by: §1.
  • X. Chen, C. C. Yan, X. Zhang, X. Zhang, F. Dai, J. Yin, and Y. Zhang (2016) Drug-target interaction prediction: Databases, web servers and computational models. Briefings in Bioinformatics 17 (4), pp. 696–712. External Links: Document, ISBN 1477-4054 (Electronic) 1467-5463 (Linking), ISSN 14774054 Cited by: §1.
  • M. C. Cobanoglu, C. Liu, F. Hu, Z. N. Oltvai, and I. Bahar (2013) Predicting drug-target interactions using probabilistic matrix factorization. Journal of Chemical Information and Modeling. External Links: Document, ISSN 15205142 Cited by: §1.
  • I. Cortés-Ciriano, Q. U. Ain, V. Subramanian, E. B. Lenselink, O. Méndez-Lucio, A. P. IJzerman, G. Wohlfahrt, P. Prusis, T. E. Malliavin, G. J. P. van Westen, and A. Bender (2015) Polypharmacology modelling using proteochemometrics (pcm): recent methodological developments, applications to target families, and future prospects. Med. Chem. Commun. 6, pp. 24–50. External Links: Document, Link Cited by: §1.
  • S. Dallakyan and A. J. Olson (2015) Small-molecule library screening by docking with pyrx. In Chemical Biology: Methods and Protocols, pp. 243–250. External Links: ISBN 978-1-4939-2269-7, Document, Link Cited by: §5.2.
  • M. I. Davis, J. P. Hunt, S. Herrgard, P. Ciceri, L. M. Wodicka, G. Pallares, M. Hocker, D. K. Treiber, and P. P. Zarrinkar (2011) Comprehensive analysis of kinase inhibitor selectivity.. Nature biotechnology. External Links: Document, ISSN 1546-1696 Cited by: §1, §1, §4.1.
  • H. Ding, I. Takigawa, H. Mamitsuka, and S. Zhu (2013) Similarity-basedmachine learning methods for predicting drug-target interactions: A brief review. Briefings in Bioinformatics 15 (5), pp. 734–747. External Links: Document, ISBN 1477-4054 (Electronic)$\$r1467-5463 (Linking), ISSN 14774054 Cited by: §2.
  • T. N. Doman, S. L. McGovern, B. J. Witherbee, T. P. Kasten, R. Kurumbail, W. C. Stallings, D. T. Connolly, and B. K. Shoichet (2002) Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. Journal of Medicinal Chemistry. External Links: Document, ISSN 00222623 Cited by: §1.
  • C. dos Santos, M. Tan, B. Xiang, and B. Zhou (2016) Attentive Pooling Networks. (Cv). External Links: 1602.03609, Link Cited by: §1.
  • J. Duan, S. L. Dixon, J. F. Lowrie, and W. Sherman (2010) Analysis and comparison of 2d fingerprints: insights into database screening performance using eight fingerprint methods. Journal of Molecular Graphics and Modelling 29 (2), pp. 157 – 170. External Links: ISSN 1093-3263, Document, Link Cited by: §1.
  • D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional Networks on Graphs for Learning Molecular Fingerprints. pp. 1–9. External Links: 1509.09292, Link Cited by: §1, §3.2.2.
  • A. Ezzat, P. Zhao, M. Wu, X. L. Li, and C. K. Kwoh (2017) Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology and Bioinformatics 14 (3), pp. 646–656. External Links: Document, ISSN 15455963 Cited by: §1, §2.
  • Q. Feng, E. Dueva, A. Cherkasov, and M. Ester (2018) PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction. , pp. 1–21. External Links: 1807.09741, Link Cited by: §1, §2.3, §4.1, §4.2.
  • J. Gomes, B. Ramsundar, E. N. Feinberg, and V. S. Pande (2017) Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity. arXiv e-prints, pp. 1–17. External Links: Document, 1703.10603, ISBN 0920-5691, ISSN 0148396X, Link Cited by: §1, §1, §2.
  • T. He, M. Heidemeyer, F. Ban, A. Cherkasov, and M. Ester (2017) SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. Journal of Cheminformatics 9 (1), pp. 1–14. External Links: Document, ISSN 17582946 Cited by: §1, §2.2, §2, §4.1, §4.2, §5.
  • A. L. Hopkins (2009) Drug discovery: Predicting promiscuity. External Links: Document, ISSN 00280836 Cited by: §1.
  • L. Jacob and J. P. Vert (2008) Protein-ligand interaction prediction: An improved chemogenomics approach. Bioinformatics. External Links: Document, ISBN 1367-4803, ISSN 13674803 Cited by: §2.
  • M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, and M. Tanabe (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research. External Links: Document, ISSN 03051048 Cited by: §1.
  • S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley (2016) Molecular graph convolutions: moving beyond fingerprints. Journal of Computer-Aided Molecular Design 30 (8), pp. 595–608. External Links: Document, ISSN 15734951 Cited by: §1, §1, §2, §3.2.3.
  • S. Kim, P. A. Thiessen, E. E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B. A. Shoemaker, J. Wang, B. Yu, J. Zhang, and S. H. Bryant (2016) PubChem substance and compound databases. Nucleic Acids Research. External Links: Document, ISSN 13624962 Cited by: §1.
  • C. Knox, V. Law, T. Jewison, P. Liu, S. Ly, A. Frolkis, A. Pon, K. Banco, C. Mak, V. Neveu, Y. Djoumbou, R. Eisner, A. C. Guo, and D. S. Wishart (2011) DrugBank 3.0: A comprehensive resource for ’Omics’ research on drugs. Nucleic Acids Research. External Links: Document, ISSN 03051048 Cited by: §1.
  • T. Kogej, O. Engkvist, N. Blomberg, and S. Muresan (2006) Multifingerprint based similarity searches for targeted class compound selection. Journal of Chemical Information and Modeling. External Links: Document, ISSN 15499596 Cited by: §1.
  • G. Landrum (2006)

    RDKit: Open-source Cheminformatics

    External Links: Document, ISBN 00028282, ISSN 00028282 Cited by: §1, §3.2.1.
  • M. Lapinsh, P. Prusis, A. Gutcaits, T. Lundstedt, and J. E.S. Wikberg (2001) Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions. Biochimica et Biophysica Acta (BBA) - General Subjects 1525 (1), pp. 180 – 190. External Links: ISSN 0304-4165, Document, Link Cited by: §1.
  • M. Lapinsh, P. Prusis, S. Uhlén, and J. E. S. Wikberg (2005) Improved approach for proteochemometrics modeling: application to organic compound—amine G protein-coupled receptor interactions. Bioinformatics 21 (23), pp. 4289–4296. External Links: ISSN 1367-4803, Document, Link, Cited by: §1.
  • I. Lee, J. Keum, and H. Nam (2019) DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Computational Biology 15 (6), pp. 1–21. External Links: Document, 1811.02114, ISBN 1111111111, ISSN 15537358 Cited by: §1, §1, §1, §2.
  • Y. Liu, M. Wu, C. Miao, P. Zhao, and X. L. Li (2016) Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction. PLoS Computational Biology. External Links: Document, ISSN 15537358 Cited by: §2.
  • Z. Liu, F. Guo, J. Gu, Y. Wang, Y. Li, D. Wang, L. Lu, D. Li, and F. He (2015) Similarity-based prediction for Anatomical Therapeutic Chemical classification of drugs by integrating multiple data sources. In Bioinformatics, External Links: Document, ISSN 14602059 Cited by: §1.
  • Y. Luo, X. Zhao, J. Zhou, J. Yang, Y. Zhang, W. Kuang, J. Peng, L. Chen, and J. Zeng (2017) A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nature Communications 8 (1). External Links: Document, ISSN 20411723, Link Cited by: §2.
  • S. M. Mahmud, W. Chen, H. Meng, H. Jahan, Y. Liu, and S. M. Hasan (2020) Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Analytical Biochemistry 589 (November 2019), pp. 113507. External Links: Document, ISSN 10960309, Link Cited by: §1, §1, §2.
  • P. Manoharan, K. Chennoju, and N. Ghoshal (2015) Target specific proteochemometric model development for bace1 – protein flexibility and structural water are critical in virtual screening. Mol. BioSyst. 11, pp. 1955–1972. External Links: Document, Link Cited by: §1.
  • H. E. Manoochehri and M. Nourani (2018) Predicting Drug-Target Interaction Using Deep Matrix Factorization. 2018 IEEE Biomedical Circuits and Systems Conference, BioCAS 2018 - Proceedings, pp. 1–4. External Links: Document, ISBN 9781538636039 Cited by: §2, §2.
  • J. T. Metz, E. F. Johnson, N. B. Soni, P. J. Merta, L. Kifle, and P. J. Hajduk (2011) Navigating the kinome. Nature Chemical Biology. External Links: Document, ISSN 15524469 Cited by: §1, §4.1.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013) Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, pp. 1–12. External Links: 1301.3781 Cited by: §3.3.2.
  • S. M. Nair and J. George (2018) A Novel Method For Drug Target Interaction Prediction. International Journal of Computer Engineering & Technology (IJCET 9 (3), pp. 105–114. External Links: ISSN 0976-6375, Link Cited by: §2.
  • A. Niculescu-Mizil and R. Caruana (2005) Obtaining calibrated probabilities from boosting.

    Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI 2005

    , pp. 413–420.
    External Links: 1207.1403, ISBN 0974903914 Cited by: §2.
  • C. Orellana M, R. Ñanculef, and C. Valle (2018) Boosting collaborative filters for drug-target interaction prediction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), External Links: Document, ISBN 9783030134686, ISSN 16113349 Cited by: §2.
  • H. Öztürk, A. Özgür, and E. Ozkirimli (2018) DeepDTA: Deep drug-target binding affinity prediction. Bioinformatics 34 (17), pp. i821–i829. External Links: Document, 1801.10193, ISSN 14602059 Cited by: §1, §2.
  • H. Öztürk, E. Ozkirimli, and A. Özgür (2016) A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinformatics. External Links: Document, ISSN 14712105 Cited by: §2, §4.1.
  • T. Pahikkala, A. Airola, S. Pietilä, S. Shakyawar, A. Szwajda, J. Tang, and T. Aittokallio (2015a) Toward more realistic drug-target interaction predictions. Briefings in Bioinformatics 16 (2), pp. 325–337. External Links: Document, ISSN 14774054 Cited by: §1, §5.1.
  • T. Pahikkala, A. Airola, S. Pietilä, S. Shakyawar, A. Szwajda, J. Tang, and T. Aittokallio (2015b) Toward more realistic drug-target interaction predictions. Briefings in Bioinformatics. External Links: Document, ISBN 1477-4054 (Electronic), ISSN 14774054 Cited by: §1, §2.1, §2, §2, §4.1, §4.2, §4.4, §5.
  • J. Pennington, R. Socher, and C. D. Manning (2014) GloVe: Global vectors for word representation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, External Links: Document, ISBN 9781937284961 Cited by: §3.3.2.
  • N. J. Perualila-Tan, Z. Shkedy, W. Talloen, H. W.H. Göhlmann, M. V. Van Moerbeke, and A. Kasim (2016) Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery. Journal of Bioinformatics and Computational Biology. External Links: Document, ISSN 17576334 Cited by: §1.
  • K. Pliakos, C. Vens, and G. Tsoumakas (2019) Predicting drug-target interactions with multi-label classification and label partitioning. IEEE/ACM Transactions on Computational Biology and Bioinformatics PP (JANUARY), pp. 1–1. External Links: Document, Link Cited by: §1.
  • P. G. Polishchuk, T. I. Madzhidov, and A. Varnek (2013) Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of Computer-Aided Molecular Design. External Links: Document, ISSN 0920654X Cited by: §1.
  • T. Qiu, J. Qiu, J. Feng, D. Wu, Y. Yang, K. Tang, Z. Cao, and R. Zhu (2017) The recent progress in proteochemometric modelling: Focusing on target descriptors, cross-term descriptors and application scope. Briefings in Bioinformatics. External Links: Document, ISSN 14774054 Cited by: §1.
  • F. Rayhan, S. Ahmed, D. Md Farid, A. Dehzangi, and S. Shatabda (2019) CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction. Journal of Theoretical Biology. External Links: Document, ISSN 10958541 Cited by: §2.
  • A. S. Rifaioglu, H. Atas, M. J. Martin, R. Cetin-Atalay, V. Atalay, and T. Doğan (2018) Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Briefings in Bioinformatics (January), pp. 1–36. External Links: Document, ISBN 14774054 (Electronic), ISSN 1467-5463, Link Cited by: §1, §1, §1, §1, §1, §1, §3.1.
  • D. Rogers and M. Hahn (2010) Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 50 (5), pp. 742–754. Note: PMID: 20426451 External Links: Document, Link Cited by: §1, §3.2.1.
  • K. Sachdev and M. K. Gupta (2019) A comprehensive review of feature based methods for drug target interaction prediction. External Links: Document, ISSN 15320464 Cited by: §1.
  • R. Sawada, M. Kotera, and Y. Yamanishi (2014) Benchmarking a wide range of chemical descriptors for drug-target interaction prediction using a chemogenomic approach. External Links: Document, ISSN 18681751 Cited by: §1.
  • A. C. Schierz (2009) Virtual screening of bioassay data. Journal of Cheminformatics. External Links: Document, ISSN 17582946 Cited by: §1.
  • J. Y. Shi, S. M. Yiu, Y. Li, H. C.M. Leung, and F. Y.L. Chin (2015) Predicting drug-target interaction for new drugs using enhanced similarity measures and super-target clustering. Methods 83, pp. 98–104. External Links: Document, ISBN 9781479956692, ISSN 10959130, Link Cited by: §1, §2.
  • B. Shin, S. Park, K. Kang, and J. C. Ho (2019) Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction. , pp. 1–18. External Links: 1908.06760, Link Cited by: §1, §1, §1, §1, §2, §4.1.
  • O. Soufan, W. Ba-Alawi, M. Afeef, M. Essack, P. Kalnis, and V. B. Bajic (2016)

    DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning

    Journal of Cheminformatics. External Links: Document, ISSN 17582946 Cited by: §1.
  • D. Szklarczyk, A. Santos, C. Von Mering, L. J. Jensen, P. Bork, and M. Kuhn (2016) STITCH 5: Augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Research. External Links: Document, ISSN 13624962 Cited by: §1.
  • Y. Tabei, E. Pauwels, V. Stoven, K. Takemoto, and Y. Yamanishi (2012)

    Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers

    Bioinformatics 28 (18), pp. 487–494. External Links: Document, ISSN 13674803 Cited by: §1.
  • J. Tang, A. Szwajda, S. Shakyawar, T. Xu, P. Hintsanen, K. Wennerberg, and T. Aittokallio (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: A comparative and integrative analysis. Journal of Chemical Information and Modeling. External Links: Document, ISSN 15205142 Cited by: §1, §1, §4.1.
  • R. Todeschini and V. Consonni (2010) Molecular Descriptors for Chemoinformatics. Wiley. External Links: Document, ISBN 9783527628766, ISSN 1549-9596 Cited by: §1.
  • M. Tsubaki, K. Tomii, and J. Sese (2019) Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 35 (2), pp. 309–318. External Links: Document, ISSN 14602059 Cited by: §1, §3.2.4, §3.3.2, §3.3.3, §3.3.3, §4.2, §5.
  • A. Vaswani, G. Brain, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention Is All You Need. Advances in neural information processing systems (Nips), pp. 5998–6008. External Links: Link Cited by: §1, §3.4, §3.4.
  • I. Wallach, M. Dzamba, and A. Heifets (2015) AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. pp. 1–11. External Links: Document, 1510.02855, ISBN 1384-5810, ISSN 1384-5810, Link Cited by: §1, §1, §2.
  • Y. Wang, H. Chang, J. Wang, and Y. Shi (2018) Drug-target interaction prediction based on heterogeneous networks. ACM International Conference Proceeding Series Part F1432, pp. 14–18. External Links: Document Cited by: §2.
  • D. Weininger (1988) SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules. Journal of Chemical Information and Computer Sciences. External Links: Document, ISBN 0095-2338, ISSN 00952338 Cited by: §1, §3.1.
  • M. Wen, Z. Zhang, S. Niu, H. Sha, R. Yang, Y. Yun, and H. Lu (2017) Deep-Learning-Based Drug-Target Interaction Prediction. Journal of Proteome Research. External Links: Document, ISSN 15353907 Cited by: §1, §1, §2.
  • D. S. Wishart, Y. D. Feunang, A. C. Guo, E. J. Lo, A. Marcu, J. R. Grant, T. Sajed, D. Johnson, C. Li, Z. Sayeeda, N. Assempour, I. Iynkkaran, Y. Liu, A. MacIejewski, N. Gale, A. Wilson, L. Chin, R. Cummings, D. Le, A. Pon, C. Knox, and M. Wilson (2018) DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Research. External Links: Document, ISSN 13624962 Cited by: §5.1.
  • Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande (2018) MoleculeNet: A benchmark for molecular machine learning. Chemical Science 9 (2), pp. 513–530. External Links: Document, 1703.00564, ISBN 1754-5706, ISSN 20416539 Cited by: §1, §2, §3.2.2.
  • K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio (2015) Show, attend and tell: neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei (Eds.), Proceedings of Machine Learning Research, Vol. 37, Lille, France, pp. 2048–2057. External Links: Link Cited by: §1.
  • Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, and M. Kanehisa (2008) Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. External Links: Document, ISBN 1367-4811 (Linking), ISSN 13674803 Cited by: §1, §1.
  • Y. Yamanishi, M. Kotera, M. Kanehisa, and S. Goto (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics. External Links: Document, ISSN 13674803 Cited by: §2.
  • K. Yang, Z. Zhang, S. He, and X. Bo (2018) Prediction of DTIs for high-dimensional and class-imbalanced data based on CGAN. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, pp. 788–791. External Links: Document, ISBN 9781538654880 Cited by: §2.
  • N. Yasuo, Y. Nakashima, and M. Sekijima (2018) CoDe-DTI: Collaborative Deep Learning-based Drug-Target Interaction Prediction. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, pp. 792–797. External Links: Document, ISBN 9781538654880 Cited by: §2.
  • K. Yingkai Gao, A. Fokoue, H. Luo, A. Iyengar, S. Dey, and P. Zhang (2018) Interpretable drug target prediction using deep neural representation. IJCAI International Joint Conference on Artificial Intelligence 2018-July, pp. 3371–3377. External Links: ISBN 9780999241127, ISSN 10450823 Cited by: §1, §2, §3.2.2, §3.4.