Log In Sign Up

Physical Pooling Functions in Graph Neural Networks for Molecular Property Prediction

by   Artur M. Schweidtmann, et al.

Graph neural networks (GNNs) are emerging in chemical engineering for the end-to-end learning of physicochemical properties based on molecular graphs. A key element of GNNs is the pooling function which combines atom feature vectors into molecular fingerprints. Most previous works use a standard pooling function to predict a variety of properties. However, unsuitable pooling functions can lead to unphysical GNNs that poorly generalize. We compare and select meaningful GNN pooling methods based on physical knowledge about the learned properties. The impact of physical pooling functions is demonstrated with molecular properties calculated from quantum mechanical computations. We also compare our results to the recent set2set pooling approach. We recommend using sum pooling for the prediction of properties that depend on molecular size and compare pooling functions for properties that are molecular size-independent. Overall, we show that the use of physical pooling functions significantly enhances generalization.


page 1

page 2

page 3

page 4


Learning to Pool in Graph Neural Networks for Extrapolation

Graph neural networks (GNNs) are one of the most popular approaches to u...

Towards Interpretable Sparse Graph Representation Learning with Laplacian Pooling

Recent work in graph neural networks (GNNs) has lead to improvements in ...

Using Graph Neural Networks for Mass Spectrometry Prediction

Detecting and quantifying products of cellular metabolism using Mass Spe...

Convolutional Networks on Graphs for Learning Molecular Fingerprints

We introduce a convolutional neural network that operates directly on gr...

Geometry-aware Transformer for molecular property prediction

Recently, graph neural networks (GNNs) have achieved remarkable performa...

Learning Large-Time-Step Molecular Dynamics with Graph Neural Networks

Molecular dynamics (MD) simulation predicts the trajectory of atoms by s...

Utilizing Edge Features in Graph Neural Networks via Variational Information Maximization

Graph Neural Networks (GNNs) achieve an impressive performance on struct...


graph convolutional neural networks, pooling function, physics-informed machine learning, property prediction

1 Introduction

Graph neural networks (GNNs) are emerging for end-to-end learning of molecular properties [Kearnes.2016, Niepert2016, Hamilton2017, Duvenaud.2015] in a broad variety of applications including chemical engineering [Schweidtmann2020GNNIgnitionQuality, li2021introducing, rittig2022activity, rittig2022octaneFuelDesign], (quantum) chemistry [Gilmer.05.04.2017, Yang.2019, Schutt.2017, Wu.2018, back2019convolutional] and the prediction of physical [Coley.2017] and crystal properties [Chen.2019, xie2018crystal]. Although GNNs are flexible models for end-to-end learning, we show that their pooling function needs to be carefully selected because wrong decisions can lead to unphysical GNNs that are more prone to overfitting. GNNs take molecular graphs as inputs and represent atoms by nodes and bonds by edges. In addition, atoms and nodes are characterized by corresponding feature vectors. Most commonly, GNN architectures are based on message passing neural networks (MPNNs) [Gilmer.05.04.2017]. In MPNNs, the node feature vectors are updated through a series of message passing procedures of neighboring nodes. Each of these sequential message passings corresponds to the graph convolutional layers of the GNN. Then, the resulting node feature vectors are combined into a molecular fingerprint vector through pooling. This molecular fingerprint is finally mapped to molecular properties of interest by feedforward artificial neural networks (ANNs). We show that the selection of the pooling function, which combines the feature vectors of all atoms into the molecular fingerprint, is critical for the GNN’s performance.

The vast majority of previous works does not select pooling functions based on physical understanding. In the previous literature, common pooling functions are mean, sum, and max pooling 

[Wu.2021]. Most previous works use sum pooling [Xu.01.10.2018, Coley.2017, Yang.2019, Lu.2019], while a few use mean pooling [morritfey.19, Shindo.8312019]. Moreover, typically the same pooling function is used for a range of different properties. We argue that this can lead to unphysical GNNs and result in unnecessary errors. An illustrative example is the molecular weight that is given by the sum of the atom weights. In this case, using mean pooling in a standard GNN would lead to an unphysical architecture that cannot learn the correct underlying physics because the molecular mass cannot be computed as an average of atom weights. In contrast, selecting the sum pooling function according to the underlying physics enables the GNN to learn a meaningful model.
Some researchers circumvent the issue of selecting pooling functions by introducing flexible models for pooling such as set2set [Vinyals.11192015], DiffPool [Ying.2018], or SortPool [Zhang.2018]

. For example, the set2set approach employs a long short-term memory (LSTM) architecture designed for unordered and size-variant input sets 

[Vinyals.11192015]. The authors of DiffPool propose a hierarchical GNN structure that progressively coarsens the input graph in each layer by aggregating clusters of nodes until a single graph representation is obtained [Ying.2018]

. SortPool arranges the learned node representation in a consistent ordered tensor which is then truncated or extended to a user-defined fixed size 

[Zhang.2018]. Similarly, GNNs with a large number of convolutional layers combine node information through convolutions and thus reduce the importance of pooling functions [morritfey.19]. Advanced pooling methods have also been applied in molecular property prediction. For instance, Gilmer et al. (2017) [Gilmer.05.04.2017] applied the set2set method for learning various molecular properties from the QM9 data set [Ruddigkeit.2012, Ramakrishnan.2014]

and achieved state-of-the-art accuracies on all target properties compared to other GNN models at the time of publication. However, the additional flexibility typically results in larger data requirements, higher model variance, and the risk of overfitting. In other words, the selection of physical pooling functions over flexible model architectures for pooling can be understood as enforcing a hybrid model structure, which is known to reduce the data demand 

[Psichogios.1992, fiedler2008local, schweidtmann2021machine].
A few recent studies emphasize the importance of the choice of pooling functions for property prediction. Xu et al. (2018) [Xu.01.10.2018] examine sum, mean, and max pooling and conclude that sum pooling is more powerful than mean and max pooling since it can better distinguish different graph structures. Pronobis et al. (2018) [Pronobis.2018] state that decomposition of molecules into atom-wise contributions combined with a property-suitable pooling function works better for “extensive properties”. Other works use property-specific pooling functions, where mean/set2set or sum pooling is applied to “intensive” or “extensive” properties, respectively [Schutt.12112018, Schutt.2018, Gubaev.2018, Ye.12162019, Liu.2021]. Overall, the physical selection of pooling functions is somewhat contradictory in the previous literature and the terms “intensive” or “extensive” have been used colloquially and not in their thermodynamic sense. Also, a comparison of different pooling functions on the prediction and generalization capabilities of GNNs against a physical background has not been conducted yet, hence there is no guide for selecting suitable pooling functions based on physical knowledge in the literature.
We evaluate GNN pooling functions against the underlying physical nature of the learned properties. We analyze the impact of physical pooling functions on molecular properties learned from the common QM9 data set [Ruddigkeit.2012, Ramakrishnan.2014] and demonstrate their superior performance.

2 Materials and methods

We first describe the graph representation of molecules and then briefly introduce the general GNN architecture used for property prediction. Finally, we provide physical insight into the learned properties and use the insight to design physical GNN architectures.

2.1 Molecular graph

Molecules can be described as molecular graphs with nodes representing atoms and edges representing bonds. Each atom is described by a feature vector , containing atom information, e.g., atom mass or orbital hybridization. Similarly, each bond is described by a bond feature vector that contains information on the bond type, e.g., single or double bond. Commonly, for organic molecules the hydrogen (H) atoms are omitted and replaced by the hydrogen count as a node feature [Todeschini.2000]; this results in reducing the complexity of molecular graphs and therefore reducing data demand.

2.2 Graph neural network

GNNs exhibit two phases [Gilmer.05.04.2017] as shown in Figure 1: (i) message passing phase and (ii) readout phase.

Fig. 1: Illustration of the GNN structure highlighting the message passing and readout phases.

To initialize the message passing phase, each node is assigned a state vector initialized by the respective node feature vector [Gilmer.05.04.2017]. Then, the state vector of the nodes in layer are updated with information from their neighboring nodes along edges :

where and respectively denote the state update function and the message function in layer . This message passing procedure is repeated times until each node state vector includes information about its local environment. This iterative message passing corresponds to the stacking of graph convolutional layers.
We consider a standard GNN including edge features in the message passing phase, also known as 1-GNN [morritfey.19, Hamilton.17.09.2017]. The 1-GNN uses the following message passing function:


indicates an activation function,

denotes a parameter matrix, and denotes a feedforward ANN mapping the respective feature vectors of the edges connecting node with its neighbors to a parameter matrix , referred to as edge feature network.

In the readout phase, the final state vectors of the nodes are combined into a graph state vector by a pooling function. The pooling function is necessary for molecular property prediction because the number of atoms usually differs between different molecules. This leads to a varying number of atom feature vectors that need to be combined to the molecular fingerprints. Thus, the pooling function combines a varying number of final state vectors for the nodes into a single graph state vector. In the context of molecular property prediction, the literature commonly refers to as the molecular fingerprint. This molecular fingerprint is finally fed into a feed-forward ANN for the prediction of molecular properties, .
The molecular fingerprint is given by the pooling function that depends on the final state vectors of the nodes with :

Common choices for are the , , and functions. An alternative pooling function is the set2set method [Vinyals.11192015] which can capture more complex relationships between different atomic contributions [Gilmer.05.04.2017, Schutt.2018] by employing a long short-term memory (LSTM) model [Vinyals.11192015]. After steps of the following iterative computation, the molecular fingerprint is obtained by with

where is a query vector for iteration providing information about the previous attention readout vector from the memories, is an attention vector resulting from averaging the attention of a node by applying the softmax function, is the attention readout, similar to the simple pooling method with sum, and is a concatenation () of the current query vector and the attention readout. The vector is initialized at by .
Recent GNNs incorporate physical knowledge into message passing. Over the last years, multiple MPNN architectures have been proposed that integrate physical knowledge to the message passing scheme, e.g., SchNet [Schutt.2018], PhysNet [Unke.2019], DimeNet [Klicpera.06.03.2020], MXMNet [zhang2020molecular]. This includes the incorporation of directional information, such as interatomic distances and angles between atom pairs, into the message function modeling the interactions of atoms. We consider MXMNet that utilizes physical-driven message passing while preserving computational efficiency [zhang2020molecular]. Within MXMNet, two message passing schemes are applied. In a global message passing scheme, information between atoms with a global cutoff distance is exchanged. Further, a local message passing is applied to exchange information between atoms with a local cutoff distance with . This local cutoff distance represents the connectivity of atoms that are connected by chemical bonds. The architecture further enables to transfer of information between atom representations in the global and local message passing by including a cross layer mapping. In the readout step, the learned atom-wise representations are subsequently pooled by the sum operator for molecular property prediction.

2.3 Physical insight

The prediction of molecular properties by decomposing molecules into atomic contributions has a long history in chemical research. According to Bonchev [Bonchev.1991], the first investigations into properties of molecules with additive characteristics in terms of atomic contributions were carried out in the 1850s. Later, quantitative structure-property relationship (QSPR) and group additivity methods were developed based on the additive character of atoms or functional groups within a molecule [Katritzky.1995, Benson.1969, Gani.1991, JOBACK.1987]. Yet, not every molecular property exhibits purely additive effects.
In thermodynamics, macroscopic properties are categorized as intensive or extensive [QuantitiesUnitsSymbols]. A system property is extensive if it scales linearly with the mass (and as such “extent”) of the system; examples are the mass or volume. In contrast, intensive properties do not change with the system mass. Note that sometimes thermodynamicists also distinguish between intensive (e.g., temperature and pressure) and specific (extensive quantity divided by volume, e.g., density) properties [stephan2013thermodynamik], but we will not. We transfer these concepts to molecules and distinguish between molecular size-independent and molecular size-dependent properties. Molecular size-dependent properties scale with the number of atoms in a molecule. For example, the molecular weight is determined by how many atoms of which type are present in a molecule. In contrast, there exist molecular size-independent properties that do not scale with the number of atoms in a molecule, e.g., the highest occupied molecular energy level (HOMO) [Schutt.2018]. Moreover, some properties of a substance, e.g., activity or toxicity, are mostly influenced by certain functional groups or structural fragments. Note that this dependency on molecular size does not necessarily correspond to the formal definition of intensive or extensive properties in a thermodynamic sense because the former is considered at a microscopic atom-based molecular level, not at a macroscopic mass-based system level. For example, the molar enthalpy with the unit J/mol is an intensive property. On a molecular level, the enthalpy of atomization with the unit J/mol, is a molecular size-dependent property describing the amount of energy needed to break up a molecule into all of its single atoms at room temperature and fixed pressure [Gilmer.05.04.2017].
Schütt et al. [Schutt.2018]

consider the QM9 properties dipole moment (

), isotropic polarizability (), electronic spatial extent (), zero point vibrational energy (ZPVE), heat capacity at 298.15K (), atomization energy at 0K (), atomization energy at 298.15K (), enthalpy of atomization at 298.15K (), free energy of atomization at 298.15K () as “extensive”. Other properties are the highest occupied molecular orbital (), lowest unoccupied molecular orbital energy level (), and HOMO-LUMO gap (). For some of these properties, physical dependencies on molecular size are known. For example, Miller and Savchik [Miller79] developed a semi-empirical approach for the prediction of isotropic polarizabilities as a sum of atomistic contributions that depend on their hybridization states based on theoretical calculations already in 1979. Each nonlinear molecule has

vibrational degrees of freedom (

for linear ones), being its number of atoms. Each degree of freedom has a ZPVE proportional to its frequency . Here, is the reduced mass of the parts of a molecule that vibrate with respect to each other and is the force constant of this vibration. Hence, ZPVE is a molecular size-dependent property to first order but the frequencies of vibrations that include large fractions of a molecule decrease with increasing molecular size. This effect should be learned by the GNN. Similar relations apply to the heat capacity, enthalpy, and entropy contributions with the minor complication that terms dependent on the molecular mass (translation) and the moment of inertia (rotation) arise for some of the contributions [Atkins2011QM]. Atomization energies and enthalpies include essentially sums of contributions of all bonds, which may be non-local in the case of conjugated bonds, and are hence molecular size-dependent properties as well. The electronic spatial extent is determined mainly by the shapes of the orbitals in very small molecules and is closely related to the radius of gyration for large molecules, which may even depend on the solvent for large molecules such as polymers and may scale with a fractal exponent in this case.
The dipole moment () is a particularly interesting property. Even though formally the molecule size enters the dipole moment equation, in most molecules local functional groups determine

, and depending on orientation they can even weaken each other. In particular, one or a few strongly polar groups dominate the dipole moment that can then be written as the sum of the individual dipole moments vectors. Thus, we classify the dipole moment as molecular size independent. The prediction of dipole moment is expected to be challenging for conventional GNNs because long-range orientational relations between the polar groups may need to be learned by a model, e. g., for describing the difference between the polar ortho- and the unpolar para-benzoquinone.

Similarly, the relations are complex for energies of the HOMO, the LUMO, and their difference (i.e., the HOMO-LUMO-gap). Depending on the type of molecule, these orbitals may be quite localized to a certain functional group and thus independent of molecular size in some molecules. However, they may also be delocalized in other molecules and thus dependent on molecular size for small and medium-sized molecules but converge to a limit for large molecules as can be seen, e.g., from the Hückel model for conjugated double bonds [Atkins2011QM]. Hence this property may be particularly challenging for a GNN model.

3 Results and discussion

In order to demonstrate the relevance of the pooling function in the readout phase, two case studies are conducted and discussed below. First, we consider the illustrative prediction of the molecular weight. Then, we consider the prediction of twelve quantum mechanical properties collected in the QM9 data set.

3.1 Hyperparameters and implementation

Our implementations are based on the models in PyTorch Geometric developed by Fey & Lenssen 


. For our case study, we combine each mean, sum, and max as well as set2set pooling with the 1-GNN. The hyperparameters of the models are selected based on our experience from our previous work on predicting fuel properties 

[Schweidtmann2020GNNIgnitionQuality]. The molecular graphs have the following features encoded as a one-hot vector: (node) atom type, is aromatic, is in ring, hybridization (e.g., ) hydrogen count, (edge) bond type, conjugated, and stereo. The 1-GNN comprises three graph convolutional layers with hidden dimension size of 64. To map the molecular fingerprint () to the property (

) of interest, we use multilayer perceptrons (MLPs),

. The MLPs constitute four layers with #1: 64, #2: 32 #3: 16, #4: 1 neurons when mean, sum, or max pooling is used. When the set2set pooling is used, the MLP layers have #1: 128, #2: 64, #3: 32, #4: 1 neurons, because the output vector of the set2set method is twice its input size. For the set2set method, we set the number of processing steps

to 3.
We additionally test mean, sum, max pooling with MXMNet. We use the implementation and default hyperparameters with batch size 128 and global cutoff distance 5 as it was provided by the authors of MXMNet [zhang2020molecular], cf. [zhang2020github].

3.2 Illustrative case study: Molecular weight

To illustrate the importance of physical pooling functions on a simple example, we learn the molecular weight of alkanes. To compose the data set, we obtain about 2,300 alkanes from to from the PubChem database [Kim.2016] and compute their molecular weight using RDKit [rdkit].
For illustration, we split this case study into two steps. Firstly, 1-GNNs with sum, mean, and max pooling functions are trained, validated, and tested on corresponding data sets with alkanes with up to 30 C-atoms. Secondly, the trained GNNs are tested against an external data set containing alkanes with more than 30 C-atoms. Thus, the generalization capabilities of the 1-GNNs are tested against extrapolated data.

The training of the 1-GNNs is repeated ten times for each pooling function with a maximum number of 500 epochs. The initial data set with up to 30 C-atoms is randomly split into 80% training, 10% validation, and 10% test sets for each run. Since we consider alkanes, we choose the attributes of the nodes in the molecular graph to include the hydrogen count only and do not use edge attributes, hence we replace the edge feature network in the message passing of the 1-GNN by a (learned) parameter matrix that is the same for all edges.

Fig. 2: Mean absolute error in g/mol for test of the 1-GNN with different pooling functions, namely: sum, mean, max. (a) test data set of alkanes with up to 30 C-atoms, (b) external data set of alkanes with 35 up to 60 C-atoms. Results are for ten independent training runs, each with 500 periods.

Figure 2 shows the test set performance of the 1-GNN with different pooling functions. Figure 2 (a) illustrates the test results for the data set of alkanes with up to 30 C-atoms. As expected, the sum pooling leads to the best performance on the test data set with an average mean absolute error of as it captures the molecular size-dependent character of the molecular weights. In contrast, mean pooling leads to an average mean absolute error . Max pooling even leads to an average absolute error in the order of .
Figure 2 (b) shows the mean absolute error on the test data sets of alkanes with more than 30 C-atoms. The 1-GNN with sum pooling leads to an average absolute error of . In contrast, the 1-GNN with mean pooling leads to an average error of and the 1-GNN with max pooling leads to an average error of .
The results clearly show that the sum pooling function, which was selected based on our physical insight, performs better than the unphysical mean and max pooling functions for the prediction of the molecular weight. In particular, the sum pooling outperforms the unphysical pooling functions significantly when extrapolating the model. These results support our theoretical expectations that GNNs with unphysical pooling functions are more prone to overfitting. Notably, the mean absolute error of the GNN with mean pooling is much smaller on the test set with alkanes with less than 30 C-atoms compared to the test set with alkanes with more than 30 C-atoms. This result indicates that the selection of pooling functions based on the performance on a standard validation or test set could also be error-prone.

3.3 Physicochemical properties

We analyze the importance of physical pooling functions for a variety of relevant properties. In addition, we compare our results to a more complex set2set readout function and explore the MXMNet [zhang2020molecular] architecture.
We use the QM9 data set to train our models [Ruddigkeit.2012, Ramakrishnan.2014]

. The experimental setup is twofold. First, the 1-GNNs with sum, mean, max, and set2set pooling functions are trained, validated, and tested on randomly selected subsets of the whole QM9 data set. This approach assesses the interpolation capabilities of the respective pooling functions. Second, we train, validate, and test the 1-GNN and MXMNet on the QM9 data excluding molecules with exactly 9 heavy atoms. These models are then tested against molecules with 9 heavy atoms from the QM9 data set. This approach is chosen to assess the generalization capabilities of the models in terms of extrapolation ability. For each training run, the data set is randomly split into 80% training, 10% validation, and 10% test sets. The training is stopped after 300 and 900 periods for the 1-GNN and MXMNet, respectively. The mean absolute error for the test set is reported based on the period with the lowest validation error.

3.3.1 Interpolation

The results for testing sum, mean, max, and set2set pooling function on the whole QM9 data set are summarized in Table 1. Overall, the results indicate that physically meaningful pooling functions lead to favorable performances on the QM9 data set for interpolation. It can be observed that for all molecular size-dependent properties, the 1-GNN with sum pooling significantly outperforms the 1-GNNs with mean and max pooling.

For the molecular size-independent properties, we do not observe a superior performance of one pooling function; all pooling functions result in similar accuracies.

Target Pooling
sum mean max set2set
m. size-dep. 0.301 0.469 0.482 0.583
21.5 25.0 24.1 24.1
ZPVE meV 9.39 24.16 26.99 21.96
0.149 0.203 0.197 0.197
eV 0.117 0.345 0.357 0.437
eV 0.117 0.363 0.366 0.386
eV 0.123 0.329 0.383 0.390
eV 0.112 0.293 0.328 0.313
m. size-ind. Debye 0.452 0.456 0.449 0.472
meV 92.8 93.1 92.6 133.7
meV 93.1 93.7 93.8 93.7
eV 0.1322 0.1336 0.1283 0.1323
Table 1: Mean absolute errors averaged over three training runs for testing 1-GNN with different pooling functions, sum, mean, max, set2set against QM9 target properties. Properties are categorized into molecular size-independent (“m. size-ind.”) and molecular size-dependent (“m. size-dep.”) according to [Schutt.2018]. Errors of the best pooling function are bold type.

3.3.2 Generalization with 1-GNN architecture

In order to analyze the generalization capability of the pooling functions on the QM9 data set, we train the 1-GNNs only on molecules with up to 8 heavy atoms, i.e., a maximum number of 8 C, N, O, F atoms. Then, we test the prediction accuracy on an internal test set, i.e., containing molecules with up to 8 heavy atoms, and also on a data set with the remaining QM9 molecules that have exactly 9 heavy atoms. The latter test set therefore tests the extrapolation capability of the derived GNNs. The results of the extrapolation are summarized in Table 2.
The interpolation performance of the models (indicated in black in Table 2) is similar to that of the previous models trained on the whole QM9 data set (cf. Table 1). Notably, the absolute errors increase for some properties, e.g., atomization energy, as the training set is much smaller. The QM9 data set contains about 108,000 molecules with 9 atoms and about 22,000 molecules with 1 to 8 heavy atoms.
For extrapolation (indicated in blue in Table 2), we find that sum pooling performs much better compared to mean, max, and set2set pooling on all tested molecular size-dependent properties. For the molecular size-independent properties we observe that the sum pooling does not outperform the other pooling functions anymore. Rather, mean and max pooling perform slightly better compared to sum pooling. Notably, the set2set method does not improve the accuracy compared to sum, mean, and max pooling for any property.

Target 1-GNN
sum mean max set2set
0.302 0.816 0.727 1.164
1.385 8.445 8.654 7.835
17.9 24.9 23.3 29.4
83.6 236.0 229.5 227.0
ZPVE 0.0123 0.0561 0.0544 0.0547
0.0288 0.3661 0.3127 0.4508
0.158 0.312 0.291 0.309
0.650 3.509 3.360 3.129
0.180 0.816 0.766 0.773
1.365 8.082 8.100 9.613
0.171 0.759 0.757 0.631
1.239 8.368 8.117 8.220
0.181 0.724 0.802 0.706
1.197 8.275 8.125 10.424
0.167 0.690 0.705 0.757
1.142 7.671 7.363 8.751
0.469 0.465 0.454 0.501
0.588 0.572 0.569 0.611
0.110 0.109 0.112 0.127
0.142 0.140 0.142 0.158
0.116 0.114 0.110 0.118
0.179 0.171 0.167 0.173
0.161 0.156 0.150 0.156
0.226 0.219 0.208 0.219
Table 2: Mean absolute errors averaged over three independent training runs for testing 1-GNN with different pooling functions, sum, mean, max, and set2set, against: (black) QM9 data set excluding molecules with 9 heavy atoms, (blue) only molecules with 9 heavy atoms of the QM9 data set. The errors of the best pooling function are bold type. Units are equivalent to those in Table 1.

3.3.3 Generalization with MXMNet architecture

We also analyze the influence of the pooling function on the MXMNet GNN model [zhang2020molecular]. The MXMNet model reached state-of-the-art performance on several prediction tasks in QM9 [zhang2020molecular]. MXMNet includes directional information in its message passing process. For the readout step, sum pooling is applied in the original MXMNet model. We compare MXMNet performance with three different pooling functions: sum, mean, and max. Similar to the 1-GNN, we test MXMNet trained on molecules of QM9 with up to 8 heavy atoms against an internal test set and an external test set, i.e., extrapolating to molecules with 9 heavy atoms. The results are summarized in Table 3.
The interpolation performance of the MXMNet (indicated in black in Table 3) is highly favorably compared to the 1-GNN architecture, as expected. The MXMNet with sum pooling outperforms the other pooling approaches for all molecular size-dependent properties. This is in agreement with our expectations and previous observations on the 1-GNN architecture. For the molecular size-independent properties, we observe that sum, mean, and max pooling perform very similarly. Notably, the extrapolation performance of the MXMNet architecture (indicated in blue in Table 3) also significantly outperforms the 1-GNN architecture on all properties but .
The generalization results follow the same pattern as our previous observations. Again, we observe a significant advantage of sum pooling for all molecular size-dependent properties. For the molecular size-independent properties, we obtain similar performance of the pooling functions for LUMO. For , sum pooling performs only slightly better than mean and max pooling. For the HOMO and the HOMO-LUMO gap, however, mean and max pooling outperform sum pooling by a factor of more than 3. This demonstrates that also sum pooling can promote overfitting and thus prevent generalization in case of size-independent properties. Notably, the extrapolation error of the MXMNet is much larger for the unphysical pooling function compared to the extrapolation error of the simpler 1-GNN with the same pooling function. This indicates that the selection of physical pooling functions could be more important for more complex models.

Target MXMNet
sum mean max
0.0781 0.1331 0.1347
0.1887 1.2853 0.4569
1.78 2.55 2.77
104.84 199.26 182.34
ZPVE 0.00144 0.00370 0.00278
0.00226 0.04492 0.00807
0.0325 0.0503 0.0515
0.0891 2.2484 0.6094
0.0111 0.0728 0.0710
0.0265 1.4095 0.5047
0.0114 0.0729 0.0720
0.0265 1.2072 0.5025
0.0115 0.0723 0.0724
0.0265 1.5855 0.5039
0.0122 0.0662 0.0677
0.0271 2.1216 0.5009
0.0892 0.0981 0.1203
0.1551 0.1708 0.1997
0.0516 0.0460 0.0550
0.2273 0.0687 0.0743
0.0366 0.0384 0.0449
0.0700 0.0707 0.0816
0.0766 0.0729 0.0783
0.3896 0.1192 0.1263
Table 3: Mean absolute errors averaged over three independent training runs for testing MXMNet with different pooling functions, sum, mean, max, against: (black) QM9 data set excluding molecules with 9 heavy atoms, (blue) only molecules with 9 heavy atoms of the QM9 data set. Error of best pooling function are bold type. Units are equivalent to those in Table 1.

4 Conclusion

GNNs have emerged as a promising deep learning technique for end-to-end molecular property prediction in chemical engineering. The selection of pooling functions in GNNs for property prediction should be based on physical knowledge because incorrect pooling functions can promote overfitting and weaken generalization. We identify the dependency of the learned property on the molecular size as key property for the selection of the pooling function: When a property is molecular size-dependent, the sum pooling function should be used. When a property is molecular size-independent, sum pooling can lead to poor generalization. We recommend to compare sum, mean, and max pooling functions for size-independent properties.

Our computational results support this hypothesis showing that physical GNN architectures generalize better than unphysical architectures. In future research, the physical selection of pooling functions should always be considered when predicting molecular properties with GNNs.


Supported by the German Research Foundation (DFG) within the framework of the Excellence Strategy of the Federal Government and the Länder - Cluster of Excellence 2186 “The Fuel Science Center” (ID390919832). Simulations were performed with computing resources granted by RWTH Aachen University under projects thes0682 and rwth0731. MD received funding from the Helmholtz Association of German Research Centers.

Data and software availability

Our implementations are based on the models in PyTorch Geometric developed by Fey & Lenssen [Fey.362019]. The model implementation is available at under Eclipse Public License 2.0 (cf. [Schweidtmann2020GNNIgnitionQuality]). The MXMNet implementation is provided by the authors of MXMNet [zhang2020molecular], cf. [zhang2020github]. We use the QM9 data set to train our models [Ruddigkeit.2012, Ramakrishnan.2014].

Authors contributions

Artur M. Schweidtmann: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Writing - Original Draft, Writing - Review & Editing, Visualization. Jan G. Rittig: Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization. Jana M. Weber: Methodology, Validation, Formal analysis, Writing - Review & Editing. Martin Grohe: Methodology, Validation, Formal analysis, Writing - Review & Editing. Manuel Dahmen: Methodology, Validation, Formal analysis, Writing - Review & Editing, Supervision. Kai Leonhard: Methodology, Validation, Formal analysis, Writing - Review & Editing. Alexander Mitsos: Methodology, Validation, Formal analysis, Resources, Writing - Review & Editing, Supervision, Project administration, Funding acquisition.