Hierarchical Inter-Message Passing for Learning on Molecular Graphs
We present a hierarchical neural message passing architecture for learning on molecular graphs. Our model takes in two complementary graph representations: the raw molecular graph representation and its associated junction tree, where nodes represent meaningful clusters in the original graph, e.g., rings or bridged compounds. We then proceed to learn a molecule's representation by passing messages inside each graph, and exchange messages between the two representations using a coarse-to-fine and fine-to-coarse information flow. Our method is able to overcome some of the restrictions known from classical GNNs, like detecting cycles, while still being very efficient to train. We validate its performance on the ZINC dataset and datasets stemming from the MoleculeNet benchmark collection.READ FULL TEXT VIEW PDF
Graph neural networks have recently achieved great successes in predicti...
Message passing neural networks have become a method of choice for learn...
Graph Neural Networks (GNNs) are the subject of intense focus by the mac...
Massive graphs, such as online social networks and communication network...
We present a principled and efficient planning algorithm for collaborati...
This paper proposes a novel message passing neural (MPN) architecture
Graph neural network have achieved impressive results in predicting mole...
Hierarchical Inter-Message Passing for Learning on Molecular Graphs
. In particular, graph neural networks (GNNs) have been proven to be very successful for this task, exceeding the previously predominated approach of manual feature engineering by a large margin(Gilmer et al., 2017; Schütt et al., 2017). In contrast to hand-crafted features, GNNs learn high-dimensional embeddings of atoms that are able to represent their complex interactions by exchanging and aggregating messages between them.
In this work, we present a hierarchical variant of message passing on molecular graphs. Here, we utilize two separate graph neural networks that operate on complementary representations of a molecule simultaneously: its raw molecular representation and its corresponding (coarsened) junction tree representation. Each of the two GNN’s intra-message passing step is strengthened by an inter-message passing step that exchanges intermediate information between the two representations. This allows the network to reason about hierarchy, e.g., rings, in molecules in a natural fashion, and enables the GNN to overcome some of its restrictions, e.g., detecting cycles (Loukas, 2020), without relying on more sophisticated architectures to do so (Morris et al., 2019; Murphy et al., 2019; Maron et al., 2019). We show that this simple scheme can drastically increase the performance of a GNN, reaching state-of-the-art performance on a variety of different datasets. Despite its higher-order nature, our proposed network architecture is still very efficient to train and causes only marginal additional costs in terms of memory and execution time.
Graph neural networks operate on graph representations of molecules , where nodes represent atoms and edges are defined by a predefined structure or by connecting atoms that lie within a certain cutoff distance. Given atom features and edge features , a GNN iteratively updates node embeddings in layer by aggregating localized information via the parametrized functions
where denotes a multiset and defines the neighborhood set of node (Gilmer et al., 2017). After layers, a graph representation is obtained via global aggregation of , e.g., summation or averaging.
Many existing GNNs can be expressed using this neural message passing scheme (Kipf & Welling, 2017; Veličković et al., 2018). A GNN called GIN (GIN-E in case edge features are present) (Xu et al., 2019; Hu et al., 2020b)
defines its most expressive form, showing high similarities to the popular WL-test (Weisfeiler & Lehman, 1968) while being able to operate on continuous node and edge features.
It has been shown that GNNs are unable to distinguish certain molecules when operating on the molecular graph or using limited cutoff distances, e.g., Cyclohexane and two Cyclopropane molecules (Xu et al., 2019; Klicpera et al., 2020). These restrictions mostly stem from the fact that GNNs are not capable of detecting cycles (Loukas, 2020) since they are unable to maintain information about which vertex in its receptive field has contributed what to the aggregated information (Hy et al., 2018). In this section, we present a simple hierarchical scheme to overcome this restriction, which strengthens the GNN’s performance with minimal computational overhead in return.
Our method involves learning on two molecular graph representations simultaneously in an end-to-end fashion: the original graph representation and its associated junction tree. The junction tree representation encodes the tree structure of molecules and defines how clusters (singletons, bonds, rings, bridged compounds) are mutually connected, while the graph structure captures its more fine-grained connectivity (Jin et al., 2018). We briefly revisit how junction trees are obtained from molecular graphs before describing our method in detail.
Given a graph , a tree decomposition maps into a junction tree with node set , for all , and edge set so that:
and , where represents the edge set of the induced subgraph
for all clusters , , with connections and .
The assignment of atoms to clusters is given by the matrix with iff. .
We closely follow the tree decomposition algorithm of related works (Rarey & Dixon, 1998; Jin et al., 2018). We first group all simple cycles and all edges that do not belong to any cycle into clusters in . Two rings are merged together if they share more than two overlapping atoms (bridged compounds). For atoms lying inside more than three clusters, we add the intersecting atom as a singleton cluster. A cluster graph is constructed by adding edges between all intersecting clusters, and the final junction tree is then given as one its spanning trees. Figure 1
visualizes how clusters are formed on an examplary molecule. For each cluster, we additionally hold its respective category (singleton, bond, ring, bridged compound) as one-hot encodings.
Our method is able to extend any GNN model for molecular property prediction by making use of intra-message passing in and inter-message passing to a complementary junction tree represention. Here, instead of using a single GNN operating on the molecular graph, we make use of two GNN models: one operating on the original graph and one operating on its associated junction tree , each passing intra-messages to their respective neighbors. We further enhance this scheme by making use of inter-message passing: Let and denote the intermediate representations of and , respectively. Then, we enhance both representations and by an additional coarse-to-fine information flow from to
and reverse fine-to-coarse information flow from to
with denoting trainable weights and
being a non-linearity. This leads to a hierarchical-variant of message passing for learning on molecular graphs, similar to the ones applied in computer vision(Ronneberger et al., 2015; Newell et al., 2016; Lin et al., 2017). Furthermore, each atom is able to know about its cluster assignment, and, more importantly, which other nodes are part of the same cluster. Specifially, this leads to an increased expressivity of GNNs. For example, the popular example of a Cyclohexane molecule and two Cyclopropane molecules (a single ring and two disconnected rings) (Klicpera et al., 2020) are distinguishable by our scheme since the junction tree representations are distinguishable by the most expressive GNN.
The readout of the model is then given via
with denoting the concatenation operator. A high-level overview of our method is visualized in Figure 2.
We briefly review some of the related work and their relation to our proposed approach.
Instead of using hand-crafted representations (Bartók et al., 2013), recent advancements in deep graph learning rely on an end-to-end learning of representations which has quickly led to major breakthroughs in machine learning on molecular graphs (Duvenaud et al., 2015; Gilmer et al., 2017; Schütt et al., 2017; Jørgensen et al., 2018; Unke & Meuwly, 2019; Chen et al., 2019). Most of these works are especially designed for learning on the molecular geometry. Here, earlier models (Schütt et al., 2017; Gilmer et al., 2017; Jørgensen et al., 2018; Unke & Meuwly, 2019; Chen et al., 2019) fulfill rotational invariance constraints by relying on interatomic distances, while recent models employ more expressive equivariant models. For example, DimeNet (Klicpera et al., 2020) deploys directional message passing between node triplets to also model angular potentials. Another line of work breaks symmetries by taking permutations of nodes into account (Murphy et al., 2019; Hy et al., 2018; Albooyeh et al., 2019). Recently, it has been shown that strategies for pre-training models on molecular graphs can effectively increase their performance for certain downstream tasks (Hu et al., 2020b). Our approach fits nicely into these lines of work since it also increases the expressiveness of GNNs while being orthogonal to further advancements in this field.
So far, junction trees have solely been used for molecule generation based on a coarse-to-fine generation procedure (Jin et al., 2018, 2019). In contrast to the generation of SMILES strings (Gómez-Bombarelli et al., 2018), this allows the model to enforce chemical validity while generating molecules significantly faster than the node-per-node generation procedure applied in autoregressive methods (You et al., 2018).
The idea of inter-message passing between graphs has been already heavily investigated in practice, mostly in the fields of deep graph matching (Wang et al., 2018; Li et al., 2019; Fey et al., 2020) and graph pooling (Ying et al., 2018; Gao & Ji, 2019). For graph pooling, most works focus on learning a coarsened version of the input graph. However, due to being learned, the coarsened graphs are unable to strengthen the expressiveness of GNNs by design. For example, DiffPool (Ying et al., 2018) always maps the atoms of two disconnected rings to the same cluster, while the pooling approach (Gao & Ji, 2019) either keeps or removes all atoms inside those rings (since their node embeddings are shared). The approach that comes closest to ours involves inter-message passing to a “virtual” node that is connected to all atoms (Gilmer et al., 2017; Hu et al., 2020a). Our approach can be seen as a simple yet effective extension to this procedure.
We evaluate our proposed architecture on the ZINC dataset (Kusner et al., 2017) and a subset of datasets stemming from the MoleculeNet benchmark collection (Wu et al., 2018). For all experiments, we make use of the GIN-E operator for learning on the molecular graph (Hu et al., 2020b), and the GIN operator (Xu et al., 2019) for learning on the associated junction tree. GIN-E includes edge features (e.g., bond type, bond stereochemistry) by simply adding them to the incoming node features. All models were trained with the Adam optimizer (Kingma & Ba, 2015) using a learning rate ofPyTorch (Paszke et al., 2019) and utilizes the PyTorch Geometric (Fey & Lenssen, 2019) library. Our source code is available under https://github.com/rusty1s/himp-gnn.
The ZINC dataset (Kusner et al., 2017) contains about molecular graphs and was introduced in Dwivedi et al. (2020) as a benchmark for evaluating GNN performances (using a subset of training graphs). Here, the task is to regress the constrained solubility of a molecule. While this is a fairly simple task that can be exactly computed in a short amount of time, it can nonetheless reveal the capabilities across different neural architectures. We compare ourselves to all the baselines presented in Dwivedi et al. (2020), and additionally report results of a GIN-E baseline that does not make use of any additional junction tree information. Furthermore, we also perform experiments on the full dataset.
|Method||Mean Absolute Error (MAE)|
|ZINC (10k)||ZINC (Full)|
As shown in Table 1, our method is able to significantly outperform all competing methods. In comparison to GIN-E, its best perfoming competitor, the additional junction tree extension is able to reduce the error rate about 40–60%.
Following upon Murphy et al. (2019), we evaluate our model on the HIV, MUV and Tox21 datasets from the MoleculeNet benchmark collection (Wu et al., 2018), using a 80%/10%/10% random split. Here, the task is to predict certain molecular properties (cast as binary labels), e.g., whether a molecule inhibits HIV virus replication or not. We compare ourselves to the neural graph fingerprint (NGF) operator (Duvenaud et al., 2015), and its relational pooling variant RP-NGF (Murphy et al., 2019), as well as our own GIN-E baseline.
As the results in Table 2 indicate, our method beats both NGF and GIN-E in test performance. Although RP-NGF is able to distinguish any graph structure by considering permutations of nodes, our approach leads to overall better generalization despite its simplicity.
We also test the performance of our model on the newly introduced datasets ogbg-molhiv and ogbg-molpcba from the OGB benchmark dataset suite (Hu et al., 2020a), which are adopted from MoleculeNet and enhanced by a more challenging and standardized scaffold splitting procedure. We closely follow the experimental protocol of Hu et al. (2020a) and report ROC-AUC and PRC-AUC for ogbg-molhiv and ogbg-molpcba, respectively. We compare ourselves to three variants that do not make use of additional junction tree information, namely GCN-E, GatedGCN-E and GIN-E (Kipf & Welling, 2017; Bresson & Laurent, 2017; Dwivedi et al., 2020; Hu et al., 2020b, a).
Results are presented in Table 3. As one can see, our approach is able to outperform all its competitors. Interestingly, our model achieves its best results in combination with a small amount of layers ( or ), making its runtime and memory requirements on par with the other baselines (which make use of layers). This can be explained by the fact that the additional coarse-to-fine information flow enhances the receptive field size of a GNN, and therefore omits the need to stack a multitude of layers.
We introduced an end-to-end architecture for molecular property prediction that utilizes inter-message passing between graph representations of different hierarchy. Our proposed method can be used as a plug-and-play extension to strengthen the capabilities of a GNN operating on molecular graphs with little to no overhead. In future works, we are interested in studying how the proposed approach can be applied to other domains as well, e.g., social networks.
Junction tree variational autoencoder for molecular graph generation.In ICML, 2018.
Stacked hourglass networks for human pose estimation.In ECCV, 2016.
PyTorch: An imperative style, high-performance deep learning library.In NeurIPS, 2019.
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions.In NIPS, 2017.
PhysNet: A neural network for predicting energies, force, dipole moments, and partial charges.Journal of Chemical Theory and Computation, 15(6):3678–3693, 2019.
Cross-lingual knowledge graph alignment via graph convolutional networks.In EMNLP, 2018.