Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation

11/24/2018 ∙ by Hyeoncheol Cho, et al. ∙ KAIST 수리과학과 0

Graph convolutional networks (GCNs) for learning graph representation of molecules widely expand their scope on chemical properties to biological activities, but learning the three-dimensional topology of molecules has not been explored. Many GCNs that have achieved state-of-the-art performances rely on node distances only, limiting the spatial information of molecules. In this work, we propose a novel model called three-dimensionally embedded graph convolutional network (3DGCN), which takes a molecular graph embedded in three-dimensional Euclidean space as an input and recursively updates scalar and vector features based on the relative positions of nodes. We demonstrate the capabilities of 3DGCN with tasks on physical and biophysical predictions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 1. Introduction

Deep learning methods have revolutionized the way problems in chemistry are being solved,1, 2, 3

as they have in other domains. The latest deep-learning models outperform traditional machine-learning algorithms and nonneural networks in studies on quantitative structure–activity relationships (QSARs)

4, 5, 6, computer-aided drug design7, 8, and prediction of chemical reactions9, 10, 11, 12, aided by the rapid progress of hardware and algorithms1. The most notable advance in deep learning is the advent of the graph convolutional network (GCN)13, 14, 15

, which is the derivative of the well-known convolutional neural network (CNN), for learning non-Euclidean graph structures. The GCN proves its capability of interpreting raw molecular representations, interlocked with innate graph-like structures – atoms as nodes and bonds as edges – beyond molecular fingerprints. The attempts to predict molecular properties of solubility

16, 13, 17, ab initio calculations18, 19, and biological activities20, 21

, with relatively small molecules, have provided the basis for the GCN-based generative models such as variational autoencoder

8, 22, adversarial autoencoder23, and generative adversarial network24, 25.

However, these models are based on classic graphs, focusing on the representation of the connectivity; therefore, they do not represent molecules perfectly26, 27. Molecules are intrinsically three-dimensional structures with atomic orientations and dihedral angles, having multiple conformations with a single structural formula. The inadequate representation may deteriorate the performance on tasks involving spatial topology, such as molecular dynamics simulation and protein-ligand interactions28. To bridge the gap, GCN variants with Gaussian feature expansion schemes upon bond distances and angles18, 19, 29, have been suggested, but a GCN strategy to handle the complete three-dimensional molecular topology has not been explored yet.

In this work, we propose a GCN model on graphs embedded in three-dimensional Euclidean space . Our graph convolutional model, coined the 3DGCN, uses a novel convolution mechanism, which enables topography-relevant integration of vector-formed information among neighborhoods. In essence, the 3DGCN updates the node features from adjacent neighbors in the vector form, along with node-to-node direction vectors. We expect that the 3DGCN has discriminative power on conformation-level differences on molecules.

2. Methods

Utilizing the power of graph convolution to interpret molecular structures, we built the 3DGCN based on the GCN by Kipf et al.15. We designed our model to be capable of handling a bidirected graph embedded in Euclidean space, which is how molecules are represented. Each atom is considered as a node with the Cartesian coordinate, and a bond between them is represented by two diametrical vectors with the same length. The motivation behind this representation is to embody the three-dimensional topology of molecules in a graph structure, which can take advantage of the graph convolution with some modifications.

The 3DGCN consists of three modules: convolutional layers, a feature collection layer, and a fully connected layer. As inputs, scalar features of nodes are encoded as previously described, while vector features are initialized using zeros. The convolutional layers intermingle two features from each node and sum them along neighborhoods, generating updated features. The sequence of the convolutional layers provides an effect of expanding receptive fields, integrating information from local regions. After the convolutions, the feature collection layer sums the features along the nodes to make a node-independent, molecule-level feature, as the number of nodes differ from molecule to molecule. The resulting molecule-level vector feature is fed to fully connected neural network for prediction. The mechanism will be described in detail in the following sections.

1.1 2.1 Graph Representation of Molecules

Molecular graph embedded in three-dimensional Euclidean space is represented using a set of three matricies : a feature matrix , an adjacency matrix , and a position matrix , where and are the number of atoms and the atom-level features in a molecule, respectively. The feature matrix contains the characteristics of individual atoms comprising a molecule. Detailed information of the atom-level information is listed in Table 1. The adjacency matrix is a symmetric matrix holding the graph topology. We assume that there are no isolated nodes and self-loops in our graph. To provide translational invariance, the position matrix is defined as a three-dimensional matrix rather than , by utilizing relative position vectors between atoms instead of individual positions.

Feature Description Size
Atom Type Atom type in one-hot vector of H, B, C, N, O, F, P, S, Cl, Br, I, and others. 12
Degree Number of heavy atom neighbors in one-hot vector of 0, 1, 2, 3, 4, 5, and 6. 7
Number of Hydrogens Number of neighboring hydrogens in one-hot vector of 0, 1, 2, 3, and 4. 5
Hybridization Hybrid orbital of an atom in one-hot vector of . 5
Aromaticity Whether an atom is the part of an aromatic system. 1
30
Table 1: Scalar features in the initial representation of molecules.

1.2 2.2 Generation of Position Matrix

To acquire three-dimensional position for each atom in molecules, we generate conformers according to the previously published protocol30 using RDKIT31 when they are not provided on the dataset. Conformers are then optimized by Merck molecular force field (MMFF94)32, 33. The lowest energy conformation is selected for generating the position matrix. Molecules that fail to generate stable conformers are excluded from the dataset.

1.3 2.3 Three-Dimensional Graph Convolution Architecture

The 3DGCN follows a neighborhood aggregation strategy, which iteratively updates the features of nodes based on the aggregated adjacent node features, while each node contains two features: scalar and vector features. Scalar features are composed of individual numbers, and vector features are the collection of three-dimensional vectors with shapes of and , where and are the number of atoms and atom-level features in a molecule, respectively. To transfer information between scalar and vector features, we define four operations updating each feature. The scalar and vector features could recursively update self and the other on the convolutional cycles along with neighborhood feature collection. Overall operations are depicted in Fig 1.

Figure 1:

Four operations between scalar and vector features inside the convolutional layer. Scalar and vector features from the center node (orange) and neighborhood (blue) generate new features. Scalar-to scalar (a) and vector-to-vector (c) operations combine coordinative features from the center and neighbor node. Vector-to-scalar operation (b) employs the scalar projection of vector feature from the neighboring node on the relative direction between the center and neighborhood to transform information as a scalar feature. Scalar-to-vector operation (d) expands the scalar feature of neighborhood to a vector feature by tensor product with the relative direction. W stands for weight matrix, b for bias, and

for nonlinearity.

Operations of scalar-to-scalar and vector-to-vector features combine two adjacent nodes by concatenating the features before the linear combination with rectified linear activation function (ReLU). Specifically, if you have scalar features,

and from node and on layer , the updated scalar feature (scalar-to-scalar) is defined as

(1)

or, with vector features and , the updated vector feature (vector-to-vector) is represented as

(2)

where is concatenation, is trainable weights, and is biases. It should be pointed out that the linear combination of vector-to-vector operation does not flatten the concatenated features to retain equal weights along the x, y, and z axes.

Next, we consider operations to interconnect the scalar and vector features that require changes in dimension. Transforming the features incorporates the relative direction between nodes to decrease or increase the rank, as the features differ only in whether the directionality exists or not. Scalar projection and tensor product are chosen for the operations, which relate the features with relative directions without significant loss of the original information. Formally,

(3)
(4)

where stands for dot product and for tensor product. Note that, unlike appositional operations, scalar-to-vector and vector-to-scalar operations employ neighborhood features only.

Once we have four operations on scalar and vector features, four updated features should be combined into two features for convolution along neighborhoods. Two updated features for scalar and vector features each are in the concatenated and reduced dimension by linear combination with nonlinearity. Afterwards, the features of neighboring nodes are collected and averaged by the normalized adjacency matrix, generating and .

(5)
(6)

At the end of the convolution cycles, information distributed on the entire graph should be accumulated to predict certain properties of a molecule. In the process of accumulation, all vector features from all nodes are summed to make molecule-level features independent from the number of atoms in the molecule. In addition, summation provides order invariance of atoms and bonds encoded in the input.

1.4 2.4 Datasets

For our experiments, we focused on regression tasks to evaluate the ability to quantitatively interpret molecular information which is necessary for further comparison on conformations. We trained our model with three publicly available and commonly used datasets.

FreeSolv

The Free Solvation Database (FreeSolv) is a dataset of experimental and calculated hydration free energies optimized by general AMBER force field (GAFF)34. Atomic coordinates used for GAFF hydration free energy calculations are included with structural information in the SMILES format. Experimental values are used for training our models, and the coordinates are used without modification.

Esol

The ESOL dataset is a small collection of aqueous solubilities in for 1128 compounds35. The dataset targets prediction of log-valued solubilities from chemical structures provided as SMILES encoded strings. To obtain 3D coordinates of individual atoms, we employed a conformer generation strategy with provided SMILES as previously described36. Molecules that failed to generate stable conformers were excluded.

Bace

The BACE dataset has experimental binding affinities of 1522 small molecule inhibitors on human -secretase 1, potential therapeutic target for Alzheimer’s37. Individual values are collected from various laboratories in academia and pharmaceutical companies. The dataset focuses on modeling protein–molecule interactions. For 3D conformer generation, the same procedure as ESOL dataset was employed.

1.5 2.5 Model Training and Evaluation

The 3DGCN was implemented in Python using open-source machine learning library Keras

38

2.1.6 with TensorFlow

39

1.5 as a backend. Training was controlled by learning rate decay and early-stopping techniques, which observed the validation error to lower the learning rate or stop the training. The learning rate was decreased by a factor of 0.9 when the loss reached plateau, with a patience of 5, and the termination was determined with a patience of 25. Enough epochs were set to prevent termination by the end of epochs, not by the early-stopping mechanism. The models were evaluated by mean squared error (MSE) with 10-fold stratified cross-validation (CV). For each fold, the dataset was randomly split into a training set (80%), validation set (10%), and test set (10%).

2 3. Results and Discussion

We demonstrated the proof-of-concept of the 3DGCN via studies on small molecules and their nonlocal properties that are derived from the structures. Three datasets on physical property prediction and biophysics estimation tasks were selected to verify our architecture to learn three-dimensional graph representations. FreeSolv and ESOL datasets are commonly used target for benchmarking deep learning models based on aqueous solubility. Aqueous solubility is a well-known physical property of molecules, which is highly influenced by local motifs such as functional groups. Typical hydrophilic and hydrophobic molecules share their motifs, and integrating the information over the molecule is the key challenge to predict the solubility. Based on these goals, we trained 3DGCN with three-dimensional graph representations of molecules to learn normalized hydration free energies and solubilities.

In addition, though there is ongoing controversy over the effectiveness of three-dimensional structures on deep learning models, protein–ligand interaction is known to be highly influenced by the conformations. Expanding the target scope of the 3DGCN from solubility tasks, we tested our model’s ability of interpreting the relationship between molecular structure and protein. The BACE dataset is a good target for examining the power to learn molecular effectiveness on a -secretase 1 enzyme, representing structural suitability on the active center of the enzyme. As BACE dataset provides both binary labels of effectiveness and values, we trained our model on values to measure the closeness of predictions and observed the structural relationships. The evaluation results using the 3DGCN for the datasets are listed in Table 2.

Datasets Compounds MAE RMSE
Validation Test Validation Test
FreeSolv 643 0.642 0.062 0.651 0.070 0.875 0.100 0.911 0.103
ESOL 1128 0.490 0.030 0.494 0.037 0.623 0.028 0.625 0.056
BACE 1522 0.545 0.042 0.577 0.046 0.654 0.049 0.691 0.056
Table 2: 10-fold cross-validation performances on FreeSolv, ESOL, and BACE datasets.

The 10-fold cross-validation results show that the 3DGCN model achieves a mean RMSE of 0.911 on FreeSolv; in comparison, the state-of-the-art GCNs show RMSEs of over 1 kcal/mol19, 40. A method is generally considered to be comparable with the ab initio prediction, when its RMSE reaches 1.5 kcal/mol. On the ESOL task, our 3DGCN has an RMSE of 0.625, which is similar to or slightly higher than the results of modern GCNs. Moving beyond the predictions on unimolecular properties, the 3DGCN successfully proves its capability of simulating the interactions between -secretase and inhibitors, with an RMSE of 0.691. It should be noted that the molecules in the BACE dataset are relatively larger in size compared with those in the solubility datasets, and the targeted property relies on the structural information on the fragments of molecules.

Figure 2:

(a) Confusion plots for the test subsets of FreeSolv (red), ESOL (orange), and BACE (blue) datasets with Root Mean Squared Errors (RMSE). Trend lines for each predicted set are shown as solid lines, and dashed lines indicate identity lines. The most underpredicted and overpredicted molecules are indicated by black triangles. (b) Molecular structures of the outliers marked in (a) with experimental and predicted values. Units for the datasets are kcal/mol, log(mol/L), and log(mol/L), respectively.

The distribution graphs of the predicted values vs. true ones, as a visualization of the overall predictions, show linear-like relationship on all the tasks (Fig 2A). Test subsets are selected from the cross-validation trial, which show the closest performance to the averaged RMSE. The molecules with the most underpredicted and overpredicted values are marked on the distribution, and their structures are shown in Fig 2B. Relatively large errors on FreeSolv in terms of mean RMSE compared with the other datasets, are distributed in the marginal area of the dataset where the number of molecules for training may be insufficient.

3 4. Conclusions

Molecular interactions take place in a three-dimensional space and are highly influenced by conformations and relative orientations, which are not represented in the conventional graph structure. In this work, we propose a three-dimensionally embedded graph convolutional network (3DGCN) that incorporates three-dimensional graph topology with a graph convolutional neural network. As far as we know, there have been several attempts to incorporate spatial information of a molecule, but the 3DGCN is the first model that uses the innate architecture that handles relative position vectors. We observe the capabilities of predicting basic molecular properties as well as protein-molecule interactions in our experiments. Our key insight is that the 3DGCN has various potential applications for learning conformation- and orientation-dependent phenomena in intermolecular dynamics, protein–ligand interaction predictions, and the simulation of chemical reactions.

4 5. Acknowledgement

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (MSIP 2012R1A3A2026403).

6. References

  • LeCun et al. 2015 LeCun, Y.; Bengio, Y.; Hinton, G. Nature 2015, 521, 436–444.
  • Schmidhuber 2015 Schmidhuber, J. Neural Networks 2015, 61, 85–117.
  • Goh et al. 2017 Goh, G. B.; Hodas, N. O.; Vishnu, A. Journal of Computational Chemistry 2017, 38, 1291–1307.
  • Devillers 1996 Devillers, J. Neural networks in QSAR and drug design; Academic Press, 1996.
  • Dahl et al. 2014 Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. arXiv preprint arXiv:1406.1231 2014,
  • Ma et al. 2015 Ma, J.; Sheridan, R. P.; Liaw, A.; Dahl, G. E.; Svetnik, V. Journal of Chemical Information and Modeling 2015, 55, 263–274.
  • Gawehn et al. 2016 Gawehn, E.; Hiss, J. A.; Schneider, G. Molecular Informatics 2016, 35, 3–14.
  • Gómez-Bombarelli et al. 2018 Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. ACS Central Science 2018, 4, 268–276.
  • Kayala et al. 2011 Kayala, M. A.; Azencott, C.-A.; Chen, J. H.; Baldi, P. Journal of Chemical Information and Modeling 2011, 51, 2209–2222.
  • Wei et al. 2016 Wei, J. N.; Duvenaud, D.; Aspuru-Guzik, A. ACS Central Science 2016, 2, 725–732.
  • Segler et al. 2018 Segler, M. H.; Preuss, M.; Waller, M. P. Nature 2018, 555, 604–610.
  • Granda et al. 2018 Granda, J. M.; Donina, L.; Dragone, V.; Long, D.-L.; Cronin, L. Nature 2018, 559, 377–381.
  • Duvenaud et al. 2015 Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional networks on graphs for learning molecular fingerprints. Advances in Neural Information Processing Systems. 2015; pp 2224–2232.
  • Defferrard et al. 2016 Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in Neural Information Processing Systems. 2016; pp 3844–3852.
  • Kipf and Welling 2016 Kipf, T. N.; Welling, M. arXiv preprint arXiv:1609.02907 2016,
  • Lusci et al. 2013 Lusci, A.; Pollastri, G.; Baldi, P. Journal of Chemical Information and Modeling 2013, 53, 1563–1575.
  • Kearnes et al. 2016 Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Journal of Computer-Aided Molecular Design 2016, 30, 595–608.
  • Schütt et al. 2017 Schütt, K. T.; Arbabzadah, F.; Chmiela, S.; Müller, K. R.; Tkatchenko, A. Nature Communications 2017, 8, 13890.
  • Gilmer et al. 2017 Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. arXiv preprint arXiv:1704.01212 2017,
  • Hughes et al. 2015 Hughes, T. B.; Miller, G. P.; Swamidass, S. J. ACS Central Science 2015, 1, 168–180.
  • Fout et al. 2017 Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems. 2017; pp 6530–6539.
  • Simonovsky and Komodakis 2018 Simonovsky, M.; Komodakis, N. arXiv preprint arXiv:1802.03480 2018,
  • Blaschke et al. 2018 Blaschke, T.; Olivecrona, M.; Engkvist, O.; Bajorath, J.; Chen, H. Molecular Informatics 2018, 37, 1700123.
  • Benhenda 2017 Benhenda, M. arXiv preprint arXiv:1708.08227 2017,
  • Sanchez-Lengeling and Aspuru-Guzik 2018 Sanchez-Lengeling, B.; Aspuru-Guzik, A. Science 2018, 361, 360–365.
  • Schultz 1989 Schultz, H. P. Journal of Chemical Information and Computer Sciences 1989, 29, 227–228.
  • Bath et al. 1995 Bath, P. A.; Poirrette, A. R.; Willett, P.; Allen, F. H. Journal of Chemical Information and Computer Sciences 1995, 35, 714–716.
  • Nettles et al. 2006 Nettles, J. H.; Jenkins, J. L.; Bender, A.; Deng, Z.; Davies, J. W.; Glick, M. Journal of Medicinal Chemistry 2006, 49, 6802–6810.
  • Smith et al. 2017 Smith, J. S.; Isayev, O.; Roitberg, A. E. Chemical Science 2017, 8, 3192–3203.
  • Ebejer et al. 2012 Ebejer, J.-P.; Morris, G. M.; Deane, C. M. Journal of Chemical Information and Modeling 2012, 52, 1146–1158.
  • 31 Landrum, G. RDKit: Open-source cheminformatics. http://www.rdkit.org, Accessed 2018-11-22.
  • Halgren 1996 Halgren, T. A. Journal of Computational Chemistry 1996, 17, 490–519.
  • Tosco et al. 2014 Tosco, P.; Stiefl, N.; Landrum, G. Journal of Cheminformatics 2014, 6, 37.
  • Mobley and Guthrie 2014 Mobley, D. L.; Guthrie, J. P. Journal of Computer-Aided Molecular Design 2014, 28, 711–720.
  • Delaney 2004 Delaney, J. S. Journal of Chemical Information and Computer Sciences 2004, 44, 1000–1005.
  • Axen et al. 2017 Axen, S. D.; Huang, X.-P.; Caceres, E. L.; Gendelev, L.; Roth, B. L.; Keiser, M. J. Journal of Medicinal Chemistry 2017, 60, 7393–7409.
  • Subramanian et al. 2016 Subramanian, G.; Ramsundar, B.; Pande, V.; Denny, R. A. Journal of Chemical Information and Modeling 2016, 56, 1936–1949.
  • Chollet 2015 Chollet, F. Keras. https://keras.io, 2015; Accessed 2018-11-22.
  • Abadi et al. 2015 Abadi, M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org, 2015; Accessed 2018-11-22.
  • Wu et al. 2018 Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. Chemical Science 2018, 9, 513–530.