Predicting the properties of molecules is a crucial ingredient for computer-aided drug discovery and materials development with desired properties [16, 13, 8, 29]. Currently, quantum-chemical simulations based on density functional theory (DFT) [2, 5] are widely used to calculate the electronic structure and properties of molecules. However, because of the heavy computational cost of DFT, they are difficult to extensively explore the huge number of potential chemical compounds . To enlarge the search space, much effort has been made to apply machine learning techniques for learning molecular representations in cheminformatics and materials informatics, while it has not been fully developed. Efficient and accurate machine learning methods for the prediction of molecular properties can have a huge impact on the discovery of novel drugs and materials.
Previous work on molecular modeling with machine learning methods has mainly focused on developing hand-crafted features for molecular representations that can reflect structural similarities and biological activities of molecules. Examples include Extended-Connectivity Fingerprints , Coulomb Matrix , Symmetry Function , and Bag-of-Bonds 
. These molecular representations can be used for the prediction of molecular properties with logistic regression and kernel methods.
Recently, deep learning methods have gained a lot of attention for learning molecular representations, with the availability of large scale training data generated by quantum-chemical simulations. In particular, graph neural networks are reasonable and attractive approaches since they can learn appropriate molecular representations that are invariant to graph isomorphism in an end-to-end fashion. While a number of graph neural networks have been proposed and applied to molecular modeling, developing accurate and scalable neural networks enough to express a variety of molecules is still a challenging problem.
In this work, we present a simple and powerful graph neural network, gated graph recursive neural networks (GGRNet), for learning molecular representations and predicting molecular properties. To construct an expressive and accurate neural network, we model a molecule as a complete directed graph in which each atom has a three-dimensional coordinates, and update hidden vectors of atoms depending on the distances between them. In our model, the parameters for learning hidden atom vectors are shared across all layers, and the input embeddings are fed into every layers as skip connections to accelerate the training. Our model also allows to incorporate arbitrary features such as the number of atoms in the molecule, which is helpful for learning better representations of molecules.
We validate our model on three benchmark datasets for molecular property prediction: QM7b, QM8, and QM9 and empirically show that our model achieves superior performance than conventional methods, which highlights the potential of our model for molecular graph learning.
2 Related Work
In cheminformatics, hand-crafted features for molecules, referred to as molecular fingerprints, have been actively developed for encoding the structure of molecules [6, 25, 26, 3, 15]. These molecular fingerprints are typically a binary or integer vector that represents the presence of particular substructures in the molecule, and can be used as feature vectors for the prediction of molecular properties with machine learning [19, 11].
For example, in Extended-Connectivity Circular Fingerprints (ECFP) , atoms are initially assigned to integer identifiers, then the identifiers are iteratively updated with neighboring atoms and collected into the fingerprint set. Bag-of-bonds descriptor 
, inspired by “bag-of-words” featurization in natural language processing, collects “bags” that correspond to different types of bonds such as “C-C” and “C-H”, and each bond in the bag is vectorized aswhere and are the nuclear charges, while and are the positions of the two atoms in the bond.
Graph neural networks for molecules
Recently, graph neural networks have attracted a lot of attention for a wide variety of tasks, including graph link prediction , chemistry and biology [12, 13, 28], natural language processing [21, 30]
, and computer vision[20, 7]. The neural network on graphs was early proposed by Gori et al.  and Scarselli et al. , and a large number of architectures have been proposed until now.
Gilmer et al.  proposed neural message passing networks (MPNNs) framework for learning molecular representations and showed that many graph neural networks such as Gated Graph Neural Networks (GG-NN) , Interaction Networks 
, and Deep Tensor Neural Networks (DTNN) fall under the framework. They tested the MPNN on QM9 dataset for the prediction of molecular properties and achieved the state-of-the-art results.
Schütt et al.  introduced SchNet with the continuous-filter convolutions that map the atom positions in the molecule to the corresponding filter values. The learned filters are interacted with atom features to generate more sophisticated atom representations. The continuous-filter convolutions are incorporated into graph neural networks for the prediction of molecular energy and interatomic forces. SchNet is similar to ours in that it assumes the atoms have spartial information and learns the interactions between atoms depending on the distances.
Veličković et al. 
introduced an attention mechanism to graph neural networks. The graph attention networks aggregate the hidden representations of vertices in the graph by weighting over its neighbors, following the self-attention mechanism that is widely used in many sequential models. They show that the attention mechanism is helpful for node classification tasks on citation network and protein-protein interaction.
3 Gated Graph Recursive Neural Networks (GGRNet)
A molecular graph consists of atoms and chemical bonds where atoms correspond to vertices and chemical bonds correspond to edges. Figure 1 shows an example of 2D and 3D structures of a molecule. The conventional graph neural networks mainly handle with 2D structure while we treat the molecule as 3D graph structure where three-dimensional coordinates of each atom are given.
In this section, we propose a gated graph recursive neural networks (GGRNet) for accurately learning molecular representations. Since our model can be formulated as a MPNN framework , we basically follow the notation and definitions given in .
To build an expressive graph neural network for molecules, one needs to stack multiple layers for learning hidden representations of atoms and bonds. However, as the number of parameters increases, it suffers from inefficient training and lower performance. Our GGRNet alleviates the problem by the following ideas: 1) the parameters for updating hidden representations are shared across all layers, 2) the input representations (atom embeddings and additional input features) are fed into every layers as skip connections to accelerate the training, and 3) the feature vector at each atom is updated by using that of every other atoms in the graph depending on the distances between atoms.
Formally, suppose we are given a molecular graph in which each atom has a three-dimensional position. Let and be d-dimensional atom embeddings of a vertex and , respectively. In our model, the hidden vector of vertex at time , denotes as, is given by the message function and the update function as follows:
where the values of are summed over all vertices except , which means we assume every pair of vertices in the graph has an edge and communicate each other for every time-step.
In the original MPNN, the message function is defined as , and as the neighbors of . Our model extends it to always feed the input vectors and into the message function. Furthermore, parameters of and are shared across all time-steps.
After the time-step of message updates, the readout function aggregates all the hidden representations:
where is a target value such as molecular property.
In this work, we consider the following funcion as :
where is a concatenation function and
is a sigmoid function., , and are model parameters to be learned, which are shared across all time-steps. Note that we assume as a directed graph, hence .
In addition to , , , and , we consider two additional input features: counting feature and distance feature . The counting feature is a real-valued embedding vector that corresponds to the number of atoms in the graph. Therefore, the molecules with the same number of atoms share the same counting embeddings. The distance feature is a one-dimensional vector whose value is the reciprocal of the Euclid distance between and , that is, where , , are three-dimensional coodinates of and so as .
Initially, for all . The gating function:
, inspired by LSTM and gated convolutional neural networks, is used to extract effective features from the previous hidden vectors and input features.
The update function is simply an average as follows:
where is the number of atoms in the molecule.
Finally, the readout function is given as follows:
Different from the original MPNN, our model always feeds the same and additional input features into the recursive time step. The impact of counting feature and distance feature is evaluated in the experimental section.
The network architecture of GGRNet is shown in figure 2.
|Hyper-parameters \ Dataset||QM7b||QM8||QM9|
|Atom embedding size||50||50||50|
|Count embedding size||50||50||50|
|Hidden vector size||100||100||100|
|Number of recursive layers||5||5||5|
|Learning rate decay||0.01||0.01||0.05|
Number of epochs
We validate the performance of our GGRNet on molecular datasets in MoleculeNet . MoleculeNet is a comprehensive benchmark for molecular machine learning and it contains multiple datasets for regression and classification tasks. In this work, we use QM7b , QM8 , and QM9  datasets for regression task. In QM8 and QM9, the input is a set of discrete molecular graph with spatial positions of atoms. In QM7b, only spatial positions of atoms for each molecule are available. The target is a real-valued molecule property such as the energy of the electron and the heat capacity. For details about the dataset, please refer to .
. MPNN is the implementation of an edge network as message passing function and a set2set model as readout function. The edge network considers all neighbor atoms and the feature vectors of atoms are updated with gated recurrent units. In the readout phase, LSTM with attention mechanism is applied to a sequence of feature vectors to generate the final feature vector of the molecule.
These three baseline methods achieve state-of-the-art performance on a variety of tasks for molecular modeling and outperform the conventional methods using hand-crafted features. All the baseline results are taken from MoleculeNet . The codes of baseline models are publicly available via DeepChem open source library .
The hyperparameters of our GGRNet in the experiments are shown in table1. As shown in the table, we use almost the same hyper-parameters for every dataset. Following the MoleculeNet, we randomly split the dataset into training, validation, and test as 80/10/10 ratio. All the reported results are averaged over three independent runs.
Target properties in the training set are normalized to zero mean and unit variance using only the training set. For evaluation, the predicted values of the test set are inversely transformed using the training mean and variance. The loss function is mean squared error (MSE) between the model output and the target value. For evaluation, we use mean absolute error (MAE). We use stochastic gradient descent (SGD) for training our model. The learning rateis given by where , , are the initial learning rate, the decay rate, and the number of epochs, respectively. All the parameters including atom embeddings and counting embeddings are initialized randomly and updated during training.
4.1 QM7b Dataset
|Property \ Model||Unit||DTNN ||GGRNet|
|Atomization energy (PBE0)||kcal / mol||21.5||13.7|
|Maximal absorption intensity (ZINDO) 111The MoleculeNet paper incorrectly states this entry as “Excitation energy of maximal optimal absorption - ZINDO”||eV||1.26||1.02|
|Excitation energy at maximal absorption (ZINDO) 222The MoleculeNet paper incorrectly states this entry as “Highest absorption - ZINDO”||Arbitrary||0.074||0.072|
|First excitation energy (ZINDO)||eV||0.296||0.121|
|Ionization potential (ZINDO)||eV||0.214||0.176|
|Electron affinity (ZINDO)||eV||0.174||0.0940|
QM7b consists of 7,211 small organic molecules with 14 properties, which is a subset of the GDB-13 database . Each molecule consists of Hydrogen (H), Carbon (C), Oxygen (O), Nitrogen (N), Sulfur (S), and Chlorine (Cl). The three-dimensional coordinates of the most stable conformation and the electronic properties such as HOMO, LUMO, and electron affinity for each molecule are provided calculated by DFT simulation. The discrete graph structure of molecules are missing. Following , we train our model per target rather than multi-task learning since the per-target training gives superior performance than joint training.
Table 2 shows mean absolute error (MAE) on QM7b dataset. As shown in the table, our GGRNet consistently outperforms DTNN. The results show that GGRNet is expressive to learn molecular representations as we expected.
4.2 QM8 Dataset
|Property \ Model||GC||DTNN||MPNN||GGRNet|
QM8 dataset  is also a part of GDB-13 database  and contains 21,786 small organic molecules with 12 properties. In QM8, the time-dependent density functional theory (TDDFT) and second-order approximate coupled-cluster (CC2) are applied to calculate the molecular properties. As with QM7b, we train our model per target.
The results are shown in table 3
. As shown in the table, our GGRNet achieved the best results on 7/12 cases. However, in some cases such as “f1-CC2” and “f2-CC2”, our model gets stuck and suffers from inefficient training. We believe increasing the number of epochs slightly improves the performance, however, more expressive neural architecture is required to learn better molecular representations with a small number of epochs. We hypothesize that tbe gating function of GGRNet may be the cause of gradient vanishing. One solution is to add batch normalization or other normalization layers to GGRNet.
In “f1” and “f2” dataset, MPNN achieves the best among all other methods. We do not have a clear explanation of the results at present, however, there might be an essential differences between MPNN and GGRNet. One essential difference between MPNN and GGRNet is that MPNN employs expressive readout function by using LSTM, while GGRNet uses simple average function.
4.3 QM9 Dataset
|Property \ Model||Unit||GC||DTNN||MPNN||GGRNet|
|Cv||cal / (mol K)||0.65||0.27||0.42||0.15|
QM9 dataset  is a widely used comprehensive dataset that provides geometric, energetic, electronic and thermodynamic properties of small organic molecules. It consists of 130k molecules with 12 properties, which are calculated by quantum mechanical simulation method (DFT). Each molecule consists of Hydrogen (H), Carbon (C), Oxygen (O), Nitrogen (N), and Fluorine (F). The number of atoms in each molecule is up to 30. In QM9 dataset, the discrete graph structure of molecules and atom coordinates are provided, while our model does not use the discrete graph structure explicitly. For details about the molecule properties, please refer to .
Table 4 shows the experimental results on QM9 dataset. Again, our GGRNet consistently outperforms other baselines. This results show that our model has potential to learn the representations of small organic molecules in a variety of properties. In particular, our model drastically improves the performance of R2 (electronic spatial extent), U0 and U (atomization energy at 0K and 298.15K), enthalpy of atomization (H), and free energy of atomization (G). On the other hand, the energy of the electron in the highest occupied molecular orbital (HOMO) and that of the lowest unoccupied molecular orbital (LUMO) are not improved sufficiently.
4.4 Ablation Study
|Full model (GGRNet)||0.00372||1.61||0.000861||0.049|
|Full model without||0.00374||2.00||0.000528||0.172|
|Full model without||0.0144||154||0.00137||0.0450|
|Full model without||0.00441||1.82||0.000939||0.0668|
We performed the ablation study on the subset of QM9 dataset. Table 5 shows the experimental results. The top row is the original GGRNet and the below is the full model without counting feature, distance feature, and atom embedding feature, respectively.
The counting feature is effective for U0 (the atomization energy at 0K). The properties such as U, H, and G are expected to show a similar tendency. This result is reasonable since the atomization energy is expected to be correlated with the number of atoms.
The distance feature is indispensable for the accurate prediction of R2 (an electronic spatial extent). This is also reasonable since this property reflects the spatial distribution of electrons in the molecule. Finally, the atom embedding feature is moderately effective for all properties. Overall, we verified that every additional feature helps to improve the performance for molecular property prediction.
In this work, we proposed a GGRNet for accurate and efficient molecular property prediction. In our model, the parameters for updating hidden representations are shared across all layers, the input representations are fed into every layers as skip connections to accelerate the training, and the hidden representation at each atom is updated by using that of every other atoms in the graph, which boosts the performance of the property prediction of molecules. Experiments on the standard benchmarks of molecular property prediction generated by quantum-chemical simulations show that our model achieved the state-of-the-art performance on every datasets. Future work includes applying more expressive functions for update and readout functions for our model.
-  (2016) Interaction Networks for Learning about Objects, Relations and Physics. In Advances in Neural Information Processing Systems (NIPS), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 4502–4510. Cited by: §2.
-  (1993) Density‐functional thermochemistry. III. The role of exact exchange. The Journal of Chemical Physics 98 (7), pp. 5648–5652 (en). External Links: Cited by: §1.
-  (2007) Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces. Physical Review Letters 98 (14), pp. 146401 (en). External Links: Cited by: §1, §2.
-  (2009) 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. Journal of the American Chemical Society 131 (25), pp. 8732–8733 (en). External Links: Cited by: §4.1, §4.2.
-  (2012) Perspective on density functional theory. The Journal of Chemical Physics 136 (15), pp. 150901 (en). External Links: Cited by: §1.
-  (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Modeling 25 (2), pp. 64–73 (en). External Links: Cited by: §2.
PointNet: Deep Learning on Point Sets for 3d Classification and Segmentation.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 77–85 (en). External Links: Cited by: §2.
-  (2017) Machine learning of accurate energy-conserving molecular force fields. Science Advances 3 (5), pp. e1603015 (en). External Links: Cited by: §1.
-  (2017) Language Modeling with Gated Convolutional Networks. International Conference on Machine Learning (ICML), pp. 933–941 (en). Note: arXiv: 1612.08083 External Links: Cited by: §3.1.
-  (2015) Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Advances in Neural Information Processing Systems (NIPS), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 2224–2232. External Links: Cited by: §2.
-  (2018) Applying machine learning techniques to predict the properties of energetic materials. Scientific Reports 8 (1) (en). External Links: Cited by: §2.
-  (2017) Protein Interface Prediction using Graph Convolutional Networks. In Advances in Neural Information Processing Systems (NIPS), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 6530–6539. External Links: Cited by: §2.
-  (2017) Neural Message Passing for Quantum Chemistry. International Conference on Machine Learning (ICML), pp. 1263–1272 (en). Cited by: §1, §1, §2, §2, §3.1, §4.1, §4.3, §4.
-  (2005) A new model for learning in graph domains. In Proceedings of the IEEE International Joint Conference on Neural Networks, Vol. 2, pp. 729–734. External Links: Cited by: §2.
-  (2015) Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. The Journal of Physical Chemistry Letters 6 (12), pp. 2326–2331 (en). External Links: Cited by: §1, §2, §2.
-  (2016) Molecular Graph Convolutions: Moving Beyond Fingerprints. Journal of Computer-Aided Molecular Design 30 (8), pp. 595–608 (en). External Links: Cited by: §1.
-  (2016) Gated Graph Sequence Neural Networks. International Conference on Learning Representations (ICLR) (en). Note: arXiv: 1511.05493 Cited by: §2.
-  (2013) Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics 15 (9), pp. 095003 (en). External Links: Cited by: §4.
-  (2012) Molecular Fingerprint-Based Artificial Neural Networks QSAR for Ligand Biological Activity Predictions. Molecular Pharmaceutics 9 (10), pp. 2912–2923 (en). External Links: Cited by: §2.
-  (2017) 3D Graph Neural Networks for RGBD Semantic Segmentation. In IEEE International Conference on Computer Vision (ICCV), Venice, pp. 5209–5218 (en). External Links: Cited by: §2.
-  (2018) Semi-supervised User Geolocation via Graph Convolutional Networks. In Association for Computational Linguistics (ACL), pp. 2009–2019 (en). Cited by: §2.
-  (2014) Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1 (en). External Links: Cited by: §4.2, §4.3, §4.
-  (2015) Electronic spectra from TDDFT and machine learning in chemical space. The Journal of Chemical Physics 143 (8), pp. 084111 (eng). External Links: Cited by: §4.
-  (2019) Deep Learning for the Life Sciences. O’Reilly Media. External Links: Cited by: §4.
-  (2010) Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 50 (5), pp. 742–754 (en). External Links: Cited by: §1, §2, §2.
-  (2012) Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Physical Review Letters 108 (5), pp. 058301 (en). External Links: Cited by: §1, §2.
-  (2009) The Graph Neural Network Model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. External Links: Cited by: §2.
-  (2017) Quantum-Chemical Insights from Deep Tensor Neural Networks. Nature Communications 8, pp. 13890 (en). External Links: Cited by: §2, §2, Table 2, §4.
-  (2017) SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in Neural Information Processing Systems (NIPS), pp. 991–1001 (en). Cited by: §1, §2.
-  (2018) Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering. In The International Conference on Computational Linguistics (COLING), pp. 3306–3317 (en). Cited by: §2.
-  (2018) Graph Attention Networks. International Conference on Learning Representations (en). Cited by: §2.
-  (2018) MoleculeNet: A Benchmark for Molecular Machine Learning. Chemical science 9 (2), pp. 513–530 (en). External Links: Cited by: §4, §4, §4.
-  (2018) Link Prediction Based on Graph Neural Networks. In Advances in Neural Information Processing Systems (NIPS), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 5165–5175. External Links: Cited by: §2.