Molecular activity prediction using graph convolutional deep neural network considering distance on a molecular graph

07/02/2019 ∙ by Masahito Ohue, et al. ∙ 8

Machine learning is often used in virtual screening to find compounds that are pharmacologically active on a target protein. The weave module is a type of graph convolutional deep neural network that uses not only features focusing on atoms alone (atom features) but also features focusing on atom pairs (pair features); thus, it can consider information of nonadjacent atoms. However, the correlation between the distance on the graph and the three-dimensional coordinate distance is uncertain. In this paper, we propose three improvements for modifying the weave module. First, the distances between ring atoms on the graph were modified to bring the distances on the graph closer to the coordinate distance. Second, different weight matrices were used depending on the distance on the graph in the convolution layers of the pair features. Finally, a weighted sum, by distance, was used when converting pair features to atom features. The experimental results show that the performance of the proposed method is slightly better than that of the weave module, and the improvement in the distance representation might be useful for compound activity prediction.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In drug research and development, it takes at least ten years to produce a single drug, and development costs are estimated to be several billion US dollars or more 

[2]. High-throughput screening methods for screening compounds that show activity against proteins targeted by drug discovery from large-scale compound libraries are popular [3]; however, screening vast numbers of compounds is expensive. In contrast, virtual screening is expected to be able to predict active compounds (hit compounds) efficiently using a computer [4].

One of the frameworks of virtual screening is a ligand-based method that uses machine learning to predict activity using known activity information as a teacher label [5, 6]

. In particular, in recent years, each atom of a compound is regarded as a node, and a bond is considered as an edge graph. Based on this, feature extraction can be performed using neural networks

[7, 8, 9]. The graph convolutional neural network (GCN), which realizes the convolutional deep neural network by using a convolution operation on the graph structure, is used for such applications.

For graph feature extraction using GCN, neural graph fingerprints (NGF) [7], the GCN by Han et al. [8] and the weave module [9]

are often used. These methods do not generate compound descriptors (feature vectors) based on a specific rule like ordinary fingerprints and have the advantage of being able to represent feature vectors by learning molecular structures flexibly.

NGF and Han’s GCN do not consider edge features in the molecular graph but focus on learning the relationship with the nearest neighbor node. On the other hand, the weave module of Kearnes et al. transforms feature vectors using pair features with distant atoms in addition to atom features focused only on atoms. Thus, the Weave module can consider features between distant atoms. However, the number of atoms forming a pair is different for each distance. Furthermore, the pair features of the Weave module cannot be considered in that respect.

In this paper, we propose a new improved GCN that can consider features between distant atoms by modifying the Weave module. In order to make effective use of the distance features on the molecular graph in the Weave module, we considered three improvements: correction of the distance on the molecular graph with respect to atoms in the ring structure, convolution method of pair features, and assembling of the pair features.

Ii Weave Module [9]

The network architecture of the Weave module [9] is shown in Fig. 1. The Weave module consists of the seven transformations shown in 1⃝7⃝ in Fig. 1. This study was targeted at improving the method of generating the initial feature and transformation operation 3⃝ (transforming from the pair feature to the intermediate atom feature). These operations are described as follows, and their further details can be found in  [9].

Fig. 1: Weave module [9]

Ii-a Initial features

Initial atom feature and pair feature , which are inputs to the network, use simple descriptors such as atom and bond types. is the maximum number of atoms in the molecule. is a matrix in which number of -dimensional feature vectors (row vectors) corresponding to one atom are vertically arranged. is a matrix in which number of -dimensional feature vectors (row vectors) corresponding to one atom pair are vertically arranged.

Ii-B Transformation operation 3⃝: transform intermediate atom feature from pair feature

In Weave-module layer , the following operation, as shown in Fig. 2 is performed on all the atom pairs comprising atom . The intermediate atom feature for atom is calculated by adding them.

(1)

where is an input pair feature vector of atom pair in the -th layer, is an output atom feature vector of atom , is a weight matrix, and

is a bias vector.

is an activation function that applies ReLU to all elements of a vector.

Atom feature is vertically arranged as for all atoms .

Fig. 2: Converting pair features to atom features

Ii-C Point of issue

The following issues are present in the Weave module.

  1. Distance on the graph for atoms in a ring structure
    An uncertainty exists as to whether the distance on the graph and the real three-dimensional distance are correlated between atom pairs in the ring structure.

  2. Convolution of pair features
    Uniform weights are used for all pair features regardless of the distance length on the graph.

  3. Assembling of pair features
    All atoms in the pair are uniformly added to the convoluted pair feature, and the difference due to the distance between the pairs is not reflected.

Iii Proposed Method

Here, we introduce three improvements to solve the above-mentioned issues of the Weave module.

Iii-a Correction of distances related to atoms in ring structures (Prop. A)

The pair feature of the Weave module defines the distance between atom pairs as the length of the shortest path on the graph. The ring structure is relatively rigid in terms of the actual molecular conformation compared to the chain structure. Moreover, the distance on the conformation is shorter than the distance on the graph, considering two atoms in the molecule. Therefore, with respect to the atom of interest, the atom pair at the orthoposition and metaposition is distance 1, and the atom pair at the para position is distance 2 (Figs. 4 and 4).

To realize this definition of distance, we redefined the molecular graph by adding new edges to the atoms contained in the ring structure in the compound. The modified molecular graph is generated through Algorithm 1.

0:  molecular graph
0:  redefined molecular graph
1:  
2:  
3:  
4:  for  in  do
5:     for  in  do
6:        
7:        
8:        for  in  do
9:           if  then
10:              
11:           end if
12:        end for
13:        
14:        for  in  do
15:           
16:        end for
17:     end for
18:  end for
Algorithm 1 Redefinition of distance on ring structure
Fig. 3: Examples of ring structure
Fig. 4: Examples of redefined ring structure (Prop. A)
Fig. 3: Examples of ring structure

In Algorithm 1, GetSymmSSSR() is used to obtain the symmetrized smallest set of smallest rings (SSSR) for a molecule, flatten() converts a nested list into a one-dimensional array, ShortestPathLength() obtains the shortest path length to a certain vertex , and in the dictionary data type is the number of hops from the focused atom to the atom at which the search is stopped. Further, items() is used to obtain the searched atoms and the shortest path length from . GetSymmSSSR() is implemented in RDKit (version 2018.03.4) [10] and ShortestPathLength() is implemented in NetworkX (version 2.2) [11].

In this study, we defined . In other words, the distance between atoms in the ring structure is regarded as . is the shortest path length between the atoms in a pair on the original graph. Taking furan (5-atom ring), benzene (6-atom ring), and naphthalene (polycyclic molecule) as examples, the graph redefined in this study is shown in Fig. 4.

Iii-B Convolution of pair features with different weights (Prop. B)

We improved the weights for pair features to be determined by learning the use of neural networks. In the Weave module, pair features were convoluted using the same weight matrix, regardless of the distance length. Therefore, we labeled distances from the focus atom to distinguish each pair feature. Here, represents all distances greater than the maximum atomic pair distance . We used different weight matrices corresponding to these distances in the convolution of pair features.

In Prop. B, weight matrices according to the distance were used for atom pairs and convolution was performed. The intermediate atom feature of atom was calculated by taking the sum of atom pairs of atom . This operation is shown in Fig. 5.

Fig. 5: Convolution of pair features using different weights (Prop. B)

Iii-C Assembling pair features based on distance (Prop. C)

If the interatomic distance on the graph is large, the interatomic distance on conformation does not become constant. Atom pairs with large interatomic distances appear to be less important than those with small interatomic distances. Therefore, when finding the intermediate atom feature of atom , the closer the distance is, the larger is the weighting performed by the three kinds of coefficients :

(2)
(3)
(4)

This modifies Eq. (1) as follows:

(5)

Iv Experiments

Iv-a Dataset

We used the Biophysics datasets HIV, MUV, and PCBA from MoleculeNet [12]. Molecular data are provided in SMILES format and converted to 2-D molecular graphs using RDKit [10]. Hydrogen atoms were omitted, and compounds with the huge number of heavy atoms exceeding maximum number of atoms, , were excluded from the dataset. The number of tasks in each dataset, number of active compounds, number of inactive compounds, number of compounds, and number of excluded compounds are shown in Table I. Given that the same compound is registered with different labels between each task, the numbers of active and inactive compounds were counted in duplicate.

dataset #tasks #pos11footnotemark: 1 #neg11footnotemark: 1 #cmpds #excluded
HIV [13] 1 1,319 39,065 40,384 743
MUV [14] 17 489 249,397 93,087 0
PCBA [15] 128 471,273 33,509,569 437,035 894
11footnotemark: 1

Given that the same compound is registered with different labels between each task, the number counted in duplicate as described.

TABLE I: Details of datasets

Iv-B Training and evaluation

The GCN model was implemented using the deep learning library, Chainer Chemistry (version 0.4.0) [16]

. The hyperparameters of GCN are listed in Table 

II. These were the same as those used by Kearnes et al. [9]. We attempted to set maximum atom pair distance, , to 1–5.

In this study, the prediction performance of the model was evaluated using the ROC curve [17] and area under the curve (AUC), as shown in Eq. (6), as well as the enrichment factor (EF) [18]

in Eq. (7) using the compound order arranged in descending order of prediction probability.

(6)
(7)

where is the number of active compounds, is the number of inactive compounds, is the number of inactive compounds ranked higher than the -th active compound, is the number of compounds, is the number of active compounds in the top , and is the number of compounds in the dataset (i.e., ).

The AUC is 0.5 and 1.0 for random and complete predictions, respectively. is a value indicating how many times the active compound can be concentrated to the top through compound ranking, and EF = 1 in random prediction. In this study, and were used.

hyperparameter value
maximum number of atoms in molecule 60
maximum atomic pair distance 1–5
Weave modulek 2
50
128
#fully connected layers 2000, 100
training batch size 96
optimizer Adam
learning rate 0.001
epoch 100
trainvalidtest HIV
PCBA, MUV
trial HIV 10
PCBA, MUV 5
TABLE II: Model hyperparameters

Each dataset was divided into the training data (train), validation data (valid), and test data (test) according to the ratio shown in Table II. For each task in the dataset, we selected an epoch (learning checkpoint) that gives the best AUC for the validation data and applied it to the test data to calculate the averaged AUC value for each task. The AUC used in evaluation (AUC_eval) is calculated as follows.

(8)
(9)

where represents each task, is the epoch, and is the trial. is an AUC value of task of validation data using trained GCN with epoch in trial . is the AUC value of task of test data using a trained GCN with epoch in trial . The division of the dataset at each trial is randomly performed each time. EF (EF_eval) is also obtained in the same way as in Eq. (9).

V Results

V-a Performance of Props. A and B

The results of comparing the AUC of each dataset is shown in Table III for the models of the Weave module, Prop. A, Prop. B, and Prop. A&B. Prop. A provided higher prediction performance than the Weave module in the MUV dataset but remained as accurate as the Weave module in HIV and PCBA. The accuracy of the Prop. B alone is almost the same as that of the Weave module, while the combination of the Prop. A and B yields a slightly higher accuracy.

dataset model distance
1 2 3 4 5
HIV Weave 0.796 0.798 0.795 0.793 0.801
Prop. A 0.796 0.803 0.799 0.794 0.798
Prop. B 0.794 0.797 0.797 0.799 0.806
Prop. A&B 0.806 0.798 0.801 0.800 0.800
MUV Weave 0.680 0.720 0.739 0.689 0.743
Prop. A 0.706 0.783 0.735 0.741 0.754
Prop. B 0.723 0.738 0.714 0.671 0.736
Prop. A&B 0.757 0.760 0.704 0.737 0.693
PCBA Weave 0.822 0.824 0.821 0.821 0.823
Prop. A 0.821 0.825 0.823 0.823 0.824
Prop. B 0.822 0.821 0.820 0.822 0.823
Prop. A&B 0.819 0.821 0.823 0.822 0.821
TABLE III: AUC of each dataset using Props. A and B. The value in bold-font is the best value for each model.

The distribution of EF of Prop. A in the HIV dataset is shown in Figs. 7 and 7. EF was the highest at 19.2 when . When , EF was also slightly higher. This suggests that the feature between the distant atoms could be reflected. For the MUV dataset, no such results were seen for both EF and EF, and the performance was almost the same as that of the Weave module.

Fig. 6: EF of HIV dataset using Prop. A
Fig. 7: EF of HIV dataset using Prop. A
Fig. 6: EF of HIV dataset using Prop. A

Figs. 9 and 9 show the distributions of the AUC values. Fig. 9 shows the distribution by AUC for each task before median operation using Eq. (9). The comparison of the model combining Prop. A and B with the Weave module and Prop. A shows that the variation of the AUCs on MUV tasks were decreased. The AUC value of Prop. A&B was higher than others, in particular by separating the weights of nearby atoms. In addition, the model combining Props. A and B obtained an EF value of 18.8 when . When , the performance was better compared to that of Prop. A.

Fig. 8: The EF of HIV dataset using Prop. B
Fig. 9: AUC of MUV dataset using Prop. B
Fig. 8: The EF of HIV dataset using Prop. B

V-B Results of Prop. C

The prediction results for the HIV and MUV datasets are listed in Table IV with respect to Prop. C, three functions, and the Weave module for assembling pair features. The models of the linear and quadratic functions of Prop. C have a higher AUC value. The improvement by the model of the step function was not significant.

dataset model distance
1 2 3 4 5
HIV Weave 0.796 0.798 0.795 0.793 0.801
step 0.766 0.767 0.765 0.769 0.772
linear 0.799 0.798 0.803 0.799 0.807
quadratic 0.796 0.791 0.803 0.798 0.803
MUV Weave 0.680 0.720 0.739 0.689 0.743
step 0.629 0.721 0.692 0.677 0.690
linear 0.731 0.749 0.687 0.713 0.729
quadratic 0.752 0.742 0.713 0.722 0.702
TABLE IV: AUC of each dataset using Prop. C. The value in bold-font represents the best value for each model.

Vi Discussion

Vi-a Effect of distance correction on molecular graph

Because the number of ring structures was approximately three times the number of compounds in the dataset, it was useful to correct the distances on the graph for atoms in the ring structure using Prop. A. In this study, we focused on the atoms in the ring structure, and considered the inter-atom distance in the ring structure as , not based on covalent bonds. This unusual molecular graph structure allows us to modify the interatomic distance on the graph in the ring structure to correlate with the interatomic distance on the conformation. Given the assumption that the learning of the ring structure by the GCN can be done by correcting the distance on the graph in a ring structure from the result of an experiment, it is better to omit the feature that an atom pair belongs to the same ring. Furthermore, different rings with different interatomic bonds, such as benzene and cyclohexane, cannot be distinguished by GCNs, including the Weave module and the proposed method. The prediction performance can be expected to improve by considering the type of bond in the ring.

Vi-B Transition of weight matrix norm

For the Weave module and Prop. B, we examined how the weighting matrix changed as the learning progressed. The Frobenius norm of each weight matrix at the HIV dataset and maximum atomic pair distance 5 was determined. The 0th and 1st layers of the Weave module are as shown in Fig. 11 and Fig. 11, respectively. dist_ in each figure is the norm of the weight matrix when the distance is , and dist_over is the norm of the weight matrix when the maximum atomic pair distance is greater than 5.

Fig. 10: Comparison of weight matrix norms ( and ) in HIV dataset
Fig. 11: Comparison of weight matrix norms ( and ) in HIV dataset
Fig. 10: Comparison of weight matrix norms ( and ) in HIV dataset

According to Fig. 11, dist_0, dist_1, and dist_2 have roughly equal slopes compared to the Weave module; however, dist_5 and dist_over have gentle slopes. In the 0th layer of the Weave module, distant atom pairs were not considered very important because the value of the weight matrix does not fluctuate excessively, and they are incorporated with more emphasis on the close atom pairs. Accordingly, it is possible to improve the model performance by using different weights for the atom pair distance 0 to 2 and the other atom pairs. On the other hand, in Fig. 11, the slopes of the Weave module and dist_ were approximately equal. The values of the weight matrix were found to fluctuate significantly in the 1st layer of the Weave module, thus emphasizing not only pair features with close interatomic distances but also those with larger interatomic distances. Therefore, it may not be necessary to divide the weight matrix for each distance in the 1st layer, and it may be effective to change the composition of the weight matrix for each layer of the Weave module.

Vii Conclusion

In this study, three types of improvements were made to the operation of converting a pair feature to an atom feature in the Weave module [9], which is one of the GCN models of compounds.
A. correction of the distance on the graph in the ring structure in the compound: By changing distance of the atom pair contained in the ring structure to , the distance on the graph was corrected to correlate with the distance on conformation. As a result of the evaluation experiment, the prediction accuracy is improved compared to the Weave module, and features between distant atoms were also successfully used.
B. convolution of paired features with different weights for different distances on the graph: We attempted to generalize the model by using different weights for each distance in the convolution process combined with the proposed correction of the distance on the graph in the ring structure in the compound. The prediction accuracy was higher when performing convolution with different weights for each distance compared to the Weave module. According to the analysis of the weight matrix dynamics, the proposed method was found to be useful, especially in the 0th layer of the Weave module.
C. Assembly of paired features based on distances on the graph: We proposed a method of incorporating pair features that emphasize the atoms in the vicinity of the atom of interest by using coefficients according to the distance. We achieved some improvement in the prediction accuracy by assembling paired features by using linear and quadratic weights.

We intend to work on the following topics in the future:

  • Distinguishing ring types based on atomic bonds may improve prediction accuracy.

  • Because Weave-module vector transformation operation is complicated, it may not be possible to achieve a significant improvement in the accuracy simply by improving the transformation operation from pair features to atom features. In other transformation operations, it may be necessary to make improvements by utilizing distance features.

  • It is worthwhile to verify that this improvement is also effective for other tasks of compound supervised learning, e.g., drug-like compound filter 

    [19], side-effect prediction [20], toxicity prediction [21], and stability prediction [22, 23].

Acknowledgment

This work was partially supported by KAKENHI (Grant No. 17H01814, 17J06897, and 18K18149) from the Japan Society for the Promotion of Science (JSPS), the Core Research for Evolutional Science and Technology (CREST) “Extreme Big Data” (Grant No. JPMJCR1303) from the Japan Science and Technology Agency (JST), the Program for Building Regional Innovation Ecosystems “Program to Industrialize an Innovative Middle Molecule Drug Discovery Flow through Fusion of Computational Drug Design and Chemical Synthesis Technology” from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT), the Research Complex Program “Wellbeing Research Campus: Creating new values through technological and social innovation” from JST, and the Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from the Japan Agency for Medical Research and Development (AMED) (Grant No. JP18am0101112).

References

  • [1]
  • [2] A. Mullard, “New drug costs US $2.6 billion to develop,” Nat. Rev. Drug. Discov., vol. 13(12), pp. 877, 2014.
  • [3] R. Macarron, M. N. Banks, D. Bojanic, et al. “Impact of high-throughput screening in biomedical research,” Nat. Rev. Drug. Discov., vol. 10, pp. 188–195, 2011.
  • [4] S. P. Leelananda, S. Lindert, “Computational methods in drug discovery,” Beilstein J. Org. Chem., vol. 12, pp. 2694–2718, 2016.
  • [5] L. J. Colwell, “Statistical and machine learning approaches to predicting protein–ligand interactions,” Curr. Opin. Struct. Biol., vol. 49, pp. 123–128, 2018.
  • [6] B. J. Neves, R. C. Braga, C. C. Melo-Filho, et al. “QSAR-based virtual screening: advances and applications in drug discovery,” Front. Pharmacol., vol. 9, no. 1275, 2018.
  • [7] D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, et al. “Convolutional networks on graphs for learning molecular fingerprints,” In Proc. NIPS, pp. 2215–2223, 2015.
  • [8] H. Altae-Tran, B. Ramsundar, A. S. Pappu, et al. “Low data drug discovery with one-shot learning,” ACS Cent. Sci., vol. 3, pp. 283–293, 2017.
  • [9] S. Kearnes, K. McCloskey, M. Berndl, et al. “Molecular graph convolutions: moving beyond fingerprints,” J. Comput.-Aided Mol. Des., vol. 30(8), pp. 595–608, 2016.
  • [10] G. Landrum, RDKit: Open-source cheminformatics software. http://www.rdkit.org.
  • [11] A. A. Hagberg, D. A. Schult, P. J. Swart, “Exploring network structure, dynamics, and function using network,” In 7th Python in Sci. Conf. (SciPy), pp. 11–15, 2008.
  • [12] Z. Wu, B. Ramsundar, E. N. Feinberg, et al. “MoleculeNet: a benchmark for molecular machine learning,” Chem. Sci., vol. 9, pp. 513–530, 2018.
  • [13] AIDS Antiviral Screen Data. http://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data (accessed March 28, 2019)
  • [14] S. G. Rohrer, K. Baumann, “Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data,” J. Chem. Inf. Model., vol. 49(2), pp. 169–184, 2009.
  • [15] Y. Wang, J. Xiao, T. O. Suzek, et al. “PubChem’s BioAssay database,” Nucleic Acids Res., vol. 40(D1), pp. D400–D412, 2012.
  • [16] Chainer Chemistry: a library for deep learning in biology and chemistry. http://github.com/pfnet-research/chainer-chemistry (accessed March 28, 2019)
  • [17] A. N. Jain, A. Nicholls, “Recommendations for evaluation of computational methods,” J. Comput.-Aided Mol. Des., vol. 22(3–4), pp. 133–139, 2008
  • [18] A. Hamza, N. N. Wei, C. G. Zhan, “Ligand-based virtual screening approach using a new scoring function,” J. Chem. Inf. Model., vol. 52, pp. 963–974, 2012.
  • [19] M. Mochizuki, S. D. Suzuki, K. Yanagisawa, et al. “QEX: target-specific druglikeness filter enhances ligand-based virtual screening,” Mol. Divers, vol. 23(1), pp. 11–18, 2019.
  • [20] M. Zitnik, M. Agrawal, J. Leskovec, “Modeling polypharmacy side effects with graph convolutional networks,” Bioinformatics, vol. 34(13), pp. i457–i466, 2018.
  • [21] M. Fernandez, F. Ban, G. Woo, et al. “Toxic Colors: the use of deep learning for predicting toxicity of compounds merely from their graphic images,” J. Chem. Inf. Model., vol. 58(8), pp. 1533–1543, 2018.
  • [22] T. Tajimi, N. Wakui, K. Yanagisawa, et al. “Computational prediction of plasma protein binding of cyclic peptides from small molecule experimental data using sparse modeling techniques,” BMC Bioinform., vol. 19(19), no. 527, 2018.
  • [23] R. Watanabe, T. Esaki, H. Kawashima, et al. “Predicting fraction unbound in human plasma from chemical structure: improved accuracy in the low value ranges,” Mol. Pharm., vol. 15(11), pp. 5302–5311, 2018.