Geometric Graph Convolutional Neural Networks

09/11/2019 ∙ by Przemysław Spurek, et al. ∙ Jagiellonian University 1

Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Geometric Graph Convolutional Network (geo-GCN) which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalisation of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, geo-GCN outperforms state-of-the-art graph-based methods on image classification and chemical tasks.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


The official implementation of the geo-GCN architecture.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Convolutional Neural Networks (CNNs) outperform humans on visual learning tasks, such as image classification [13], object detection [19]

or image captioning

[26]. They have also been successfully applied to text processing [11] and time series analysis [25]. Nevertheless, CNNs cannot be easily adapted to irregular entities, such as graphs, where data representation is not organised in a grid-like structure.

Graph Convolutional Networks (GCNs) attempt to mimic CNNs by operating on spatially close neighbors. Motivated by spectral graph theory, Kipf and Welling [12] use fixed weights determined by the adjacency matrix of a graph to aggregate labels of the neighbors. Velickovic et al. [21] use attention mechanism to learn the strength of these weights. In most cases, the design of new GCNs is based on empirical intuition and there has been little investigation regarding their theoretical properties [24]. In particular, there is no evident correspondence between classical CNNs and GCNs.

(a) Structural formula
(b) Example conformation
Figure 1: The graph representation of a compound (structural formula) is shown on the left. Vertices denote atoms, and undirected edges represent chemical bonds. On the right, the 3D view (conformation) of the same molecule is depicted along with its interactions with a target protein.

In many cases, graphs are coupled with a geometric structure. In medicinal chemistry, the three-dimensional structure of a chemical compound, called a molecular conformation, is essential in determining the activity of a drug towards a target protein (Figure 1). Similarly, in image processing tasks, pixels of an image are organised in a two dimensional grid, which constitutes their geometric interpretation. However, standard GCNs do not take spatial positions of the nodes into account, which is a considerable difference between GCNs and CNNs. Moreover, in the case of images, geometric features allow to augment data with translation or rotation and significantly enlarge a given dataset, which is crucial when the number of examples is limited.

In this paper, we propose Geometric Graph Convolutional Networks (geo-GCN), a variant of GCNs, which is a proper generalisation of CNNs to the case of graphs. In contrast to existing GCNs, geo-GCN uses spatial features of nodes to aggregate information from the neighbors. On one hand, this geometric interpretation is useful to model many real examples of graphs such as graphs of chemical compounds. In this case, we are able to perform data augmentation by rotating a given graph in a spatial domain and, in consequence, improve network generalisation when the amount of data is limited. On the other hand, a single layer of geo-GCN can be parametrised so that it returns a result identical to a standard convolutional layer on grid-like objects, such as images (see Theorem 1).

The proposed method was evaluated on various datasets and compared with the state-of-the-art methods. We applied geo-GCN to classify images and incomplete images represented as graphs. We also tested the proposed method on chemical benchmark datasets. Experiments demonstrate that combining spatial information with data augmentation leads to more accurate predictions.

Our contributions can be summarised as follows:

  • We show how to use geometric features (spatial coordinates) in GCNs.

  • We prove that geo-GCN is a proper generalisation of GCNs and CNNs.

  • In contrast to existing approaches, geo-GCN allows to perform graph augmentation, which further improves performance of the model.

Figure 2: Intuition behind our approach. On the left, the result of applying a convolutional filter to the image , which extracts the top-right neighbor. The positions grid

represents spatial coordinates of the pixels; the neighbors are connected with an edge. Analogous convolution can be applied to a geometric graph representation, as shown on the right: ReLU applied to the linear transformation of the spatial features of the image graph (with

and ) allows to select (and possibly modify) the top-right neighbor, see Example 1 for details.

Related work

The first instances of Graph Neural Networks were proposed in [7] and [18]. These authors designed recursive neural networks which iteratively propagate node labels until reaching a stable fixed point. Recursive graph neural network were further developed by [15]

who used gated recurrent units for learning graph representation.

Recent approaches for graph processing rely on adapting convolutional neural networks to graph domain (graph convolutional network – GCNs). The first class of methods is based on spectral representation of graphs. The authors of [1], [9] and [3] defined spectral filters which operate on the graph spectrum. Kipf and Welling [12]

significantly simplified this process by restricting by restricting the neighborhood to only first-order neighbors. In this approach, convolutional layers followed by non-linear activity functions were stacked to process graph structure sequentially. This work was also extended to higher-order neighborhood

[27]. In [14], the notion of neighborhood in GCNs was generalized and a distance metric for graph was learned by spectral methods. The authors of [22] showed that removing nonlinearities from GCNs further reduces their complexity, but does not affect heavily the performance. Unfortunately, spectral methods are domain-dependent, which means that GCNs trained on one graph cannot be trivially transferred to another graph with a different spectral structure.

The second variant of GCNs does not use Laplacian basis to aggregate node neighbors but attempts to train convolutional filter for this purpose. To deal with varied-sized neighborhoods and to preserve the parameter sharing property of CNNs, [4] used a specific weight matrix for each node degree. Subsequent work [8] used sampling strategy to extract a fixed size neighborhood. The authors of [16] used spatial features to construct convolutional filters. In contrast to our approach, they transform geometric features using a predefined Gaussian kernels and do not focus on generalizing classical CNNs. In [21], multi-head self attention was used to train individual weights for each pair of nodes. To account edge similarities, which appears natural in the chemical domain, the authors of [20] applied attention mechanism for edges. In [6] graph and distance information were integrated in a single model, which allowed to achieve strong performance on molecular property prediction benchmarks. Moreover, not only graph distances, but also three-dimensional atom coordinates are useful in molecular predictions as it was emphasized by [2] who introduced the 3DGCN architecture. They integrated matrix of relative atom positions into GCN architecture. However, 3DGCN is a chemistry-inspired model which does not aim to generalize CNNs.

While many variants of graph neural networks achieve impressive performance, their design is mostly based on empirical intuition and evaluation. The work of [24] investigates theoretical properties of neural networks operating on graphs. Based on graph isomorphism test, they formally analyze discriminative power of popular GNN variants [12], [8] and show that they cannot learn to distinguish certain simple graph structures. In a similar spirit, our geo-GCN is a theoretically justified generalization of classical CNNs to the case of graphs.

Geometric graph convolutions

In this section, we introduce geo-GCN. First, we recall a basic construction of standard GCNs. Next, we present the intuition behind our approach and formally introduce geo-GCN. Finally, we discuss practical advantages of geometric graph convolutions.

Let be a graph, where denotes a set of nodes (vertices) and represents edges. We put if and are connected by a directed edge and if the edge is missing. Each node is represented by a

-dimensional feature vector

. Typically, graph convolutional neural networks transform these feature vectors over multiple subsequent layers to produce the final prediction.

Graph convolutions.

Let denote the matrix of node features being an input to a convolutional layer, where are column vectors. The dimension of is determined by the number of filters used in previous layer. Clearly, is the input representation to the first layer.

A typical graph convolution is defined by combining two operations. For each node , feature vectors of its neighbors are first aggregated:


The weights are either trainable ([21] applied attention mechanism) or determined by ([12] motivated their selection using spectral graph theory).

Next, standard MLP is applied to transform the intermediate representation into the final output of a given layer:


where a trainable weight matrix is defined by column vectors . The dimension of determines the dimension of the output feature vectors.

Figure 3: Intuition behind our approach with more complex filter. The result of applying a convolutional filter to the image is presented on the left. This operation can be obtained from the image graph representation (on the right) by extracting two opposite corner values (with and ) and summing them, see Example 2 for details.

Intuition behind geometric graph convolutions.

Classical GCNs operate on the neighborhood given by the adjacency matrix. In some applications, nodes are additionally described by spatial coordinates. For example, the position of each pixel can be expressed as a pair of integers. Analogically, every conformation of chemical compound is a 3-dimensional geometrical graph, where each atom is located in the space. The adjacency matrix is not able to preserve the whole information about the graph geometry. In particular, it is not possible to construct an analogue of classical convolution only from adjacency matrix and feature vectors. In our approach, we show how to include this spatial information in graph convolutions to construct a proper generalization of classical convolutions.

To proceed further, we need to introduce notation concerning convolutions (in the case of images). For simplicity we consider only convolutions without pooling. In general, given a mask its result on the image is given by


To present an intuition behind our approach, let us show how to mimic a classical linear convolution based on graph representation of the image.

Example 1.

For simplicity, let us consider a linear convolution given by the mask

Observe that as the result of this convolution on the image , every pixel is exchanged by its right upper neighbor, see Figure 2. Now we understand the image as a graph, where the neighborhood of the pixel with coordinates is given by the pixels with coordinates such that .

Given a vector and a bias we can define the (intermediate) graph operation by

Consider now the case when . One can easily observe that

where .

Consequently, we obtain that , which equals the result of the considered linear convolution.

Example 2.

Now, let us consider the mask, see Figure 3:

This convolution cannot be obtained from graph representation using a single transformation as in previous example.

To formulate this convolution, we define two intermediate operations for :

where and . The first operation extracts the right upper corner, while the second one extracts the left bottom corner, i.e.

Finally, we put

Making an additional linear transformation (analogical to (2) with ), we obtain:

As demonstrated in the above examples classical linear convolutions can be obtained from graphs by appropriate adaptation of (1) using spatial features. Based on this intuition, the precise formulation of geometric graph convolution is presented in the following paragraph. The complete proof that every linear convolution can be rewritten using geo-GCN is given in the next section.

Geometric graph convolutions.

To formalize the above intuition, we define our geometric graph convolutions. We assume that each node is additionally identified with its coordinates . In contrast to standard features , we will not change across layers, but only use them to construct better graph representation. For this purpose, we replace (1) by:


where are trainable. The pair plays a role of a convolutional filter which operates on the neighborhood of . The relative positions in the neighborhood are transformed using a linear operation combined with non-linear ReLU function. This scalar is used to weigh the feature vectors in a neighborhood.

By the analogy with classical convolution, this transformation can be extended to multiple filters (as in Example 2). Let and define -filters. The intermediate representation is a vector defined by:

Finally, we apply MLP transformation in the same manner as in (2) to transform these feature vectors.

Practical consequences.

In practice, the number of training data is usually too small to provide sufficient generalization. To overcome this problem, one can perform data augmentation to produce more representative examples. In computer vision, data augmentation is straightforward and relies on rotating or translating the image. Nevertheless, in the case of classical graph structures, analogical procedure is difficult to apply. This is a serious problem in medicinal chemistry, where the goal is to predict biological activity based only on a small amount of verified compounds. The introduction of spatial features and our geometric graph convolutions allow us to perform data augmentation in a natural way, which is not possible using only the adjacency matrix.

The formula (3) is invariant to the translation of spatial features, but its value depends on rotation of graph. In consequence, the rotation of the geometrical graph leads to different values of (3). Since in most domains the rotation does not affect the interpretation of object described by such graph (e.g. rotation does not change the chemical compound although one particular orientation may be useful when considering binding affinity, i.e. how well a given compound binds to the target protein), we can use this property to produce more instances of the same graph. This reasoning is exactly the same as in the classical view of image processing.

In addition, chemical compounds can be represented in many conformations. In a molecule, single bonds can rotate freely. Each molecule seeks to reach minimum energy, and thus some conformations are more probable to be found in nature than others. Because there are multiple stable conformations, augmentation helps to learn only meaningful spatial relations. In some tasks, conformations may be included in the dataset, e.g. in binding affinity prediction active conformations are those formed inside the binding pocket of a protein (see Figure 

0(b)). Such a conformation can be discovered experimentally, e.g., through crystallization.

Theoretical Analysis

As shown above, introducing geometric features makes the processing of graphs similar to the way of image processing. In this part, we make this statement even more evident. Namely, we formally prove that our geometric graph convolutions generalise classical convolutions used in the case of images. In other words, we show that the appropriate parametrisation of geometric graph convolutions leads to the classical convolutions.

Theorem 1.

Let be a given convolutional mask, and let (number of elements of ). Then there exist , and such that


Let denote all possible positions in the mask , i.e. .

Let denote an arbitrary vector which is not orthogonal to any element from . Then

Consequently, we may order the elements of so that . Let denote the convolutional mask, which has value one at the position , and zero otherwise.

Now we can choose arbitrary such that

for example one may take

Then observe that

and generally for every we get

where all the coefficients in the above sum are strictly positive.


and we obtain recursively that

which trivially implies that every convolution can be obtained as a linear combination of .

Since an arbitrary convolution is given by , we obtain the assertion of the theorem. ∎

On the other hand, if we put all spatial features to 0, then (3) reduces to:

This gives a vanilla graph convolution, where the aggregation over neighbors does not contain parameters. We can also use different for each pair of neighbors, which allows to mimic many types of graph convolutions.


We verified our model on graphs with a natural geometric interpretation. We took into account graphs constructed from images as well as graphs of chemical compounds.

Image graph classification

In the first experiment, we consider the well-known MNIST dataset. We represent the images as graphs in two ways following [16]. In the first case, each node corresponds to a pixel from the original image, making a regular grid with connections between adjacent pixels. The node has 2-dimensional location, and it is characterized by a 1-dimensional pixel intensity. In the second variant, nodes are constructed from an irregular grid consisting of 75 superpixels. In the latter case, the edges are determined by spatial relations between nodes using k-nearest neighbors.

We tune the hyperparameters of geo-GCN using a random search with a fixed budget of 100 trials, see supplementary material for details. We compare our method with the results reported in the literature by state-of-the-art methods used to process geometrical shapes: ChebNet

[3], MoNet [16], and SplineCNN [5].

The results presented in Table 1 show that geo-GCN outperforms comparable methods on both variants on MNIST dataset. Its performance is slightly better than SplineCNN, which reports state-of-the-art results on this task.

Method Grid Superpixels
ChebNet 99.14% 75.62%
MoNet 99.19% 91.11%
SplineCNN 99.22% 95.22%
geo-GCN 99.36% 95.95%
Table 1: Classification accuracy on two graph representations of MNIST.

Incomplete image classification

Graph representation of images can be useful to describe images with missing regions. In this case, each visible pixel represents a node which is connected with its visible neighbors. Unobserved pixels are not represented in this graph.

For the evaluation, we considered MNIST dataset, where a square patch of the size 13x13 was removed from each image. The location of the patch was uniformly sampled for each image. For a comparison, we used imputation methods, which fill missing regions at preprocessing stage. Imputations were created using:

  • mean: Missing features were replaced with mean values of those features computed for all (incomplete) training samples.

  • k-nn: Missing attributes were filled with mean values of those features computed from the nearest training samples (we used K = 5). Neighborhood was measured using Euclidean distance in the subspace of observed features.

  • mice

    : This method fills absent pixels in an iterative process using Multiple Imputation by Chained Equation (mice), where several imputations are drawing from the conditional distribution of data by Markov chain Monte Carlo techniques

Completed MNIST images were processed by fully connected and convolutional neural networks. For complete MNIST images (no missing data), these networks obtained 98.79% and 99.34% of classification accuracy, respectively.

Method Accuracy
FCNet + mean 87.59%
FCNet + k-NN 87.10%
FCNet + mice 88.59%
ConvNet + mean 90.95%
ConvNet + k-NN 90.67%
ConvNet + mice 92.10%
geo-GCN 92.40%
Table 2: Classification accuracy of graph representations of incomplete MNIST images.

The results presented in Table 2 show that geo-GCN gives better accuracy than all imputation methods on both versions of neural networks. The overall performance of geo-GCN is impressive, because geo-GCN does not use any additional information concerning missing regions. This suggests that it is better to leave unobserved features missing than to complete them with inappropriate values, which is usually a common practice.

Learning from molecules

In the next experiment, we use chemical tasks to evaluate our model. We chose 3 datasets from MoleculeNet [23] which is a benchmark for molecule-related tasks. Blood-Brain Barrier Permeability (BBBP) is a binary classification task of predicting whether or not a given compound is able to pass through the barrier between blood and the brain, allowing the drug to impact the central nervous system. The ability of a molecule to penetrate this border depends on many different properties such as lipophilicity, molecule size, and its flexibility. Another 2 datasets, ESOL and FreeSolv, are solubility prediction tasks with continuous targets.

Figure 4: Comparison of different augmentation strategies on three chemical datasets. Dataset denotes a variant without information about positions (pure GCN). Positions were predicted with the UFF method. In the conformation variant multiple conformations were precalculated and then sampled during training. Rotation augmentation randomly rotates molecules in batches. For the first bar-plot higher is better, for the second and the third lower is better.

None of the three datasets contain atom positions, so only the graph representation of a compound can be obtained. However, the three-dimensional shape of a molecule can be predicted using energy minimization, which is fairly easy to do especially for small compounds. We run universal force field (UFF) method from RDKit package to predict atom positions. Because in our method we use absolute positions, and chemical compounds do not have one canonical orientation, the positional data can be augmented with random rotations. We also run UFF a few times (up to 30) to augment the data as this procedure is not deterministic.

To evaluate our model against methods proposed by MoleculeNet, we split the datasets into train, validation, and test subsets. The splits are done according to the MoleculeNet proposition that ESOL and FreeSolv datasets should be splitted at random, and BBBP data is splitted with a scaffold split that prevents similar structures to be put into different sets – this way an algorithm cannot memorize the structures highly correlated with labels, but it needs to learn more general compound features. We run random search for all models testing 100 hyperparameter sets for each of them. All runs are repeated 3 times. The tuned hyperparameters of all tested methods are shown in the supplementary materials.

We benchmark our approach against popular chemistry models: graph-based models (Graph Convolution [4], Weave Model [10], and Message Passing Neural Network [6]

) as well as classical methods such as random forest and SVM, which often perform superbly in chemical tasks where datasets tend to be small (e.g. FreeSolv has only 513 compounds in its training set). Neither RF nor SVM operates on graphs, but rather they use calculated feature vectors which describe a molecule. In our comparison, ECFP 

[17] was used for this purpose. In addition, EAGCN [20] is included in the experiment as the method that utilizes edge attributes together with the graph structure. As for our method, we show results with train- and test-time augmentation of the data carried out in the manner described above. For all datasets, we observe slight improvements with the augmented data. In order to investigate the impact of positional features, we also enrich the atom representation of the classical graph convolutional network with our predicted atom positions and apply the same procedure of augmentation. We name this enriched architecture pos-GCN and include it in the comparison.

Method BBBP ESOL FreeSolv
SVM 0.603 0.000 0.493 0.000 0.391 0.000
RF 0.551 0.005 0.533 0.003 0.550 0.004
GC 0.690 0.015 0.334 0.017 0.336 0.043
Weave 0.703 0.012 0.389 0.045 0.403 0.035
MPNN 0.700 0.019 0.303 0.012 0.299 0.038
EAGCN 0.664 0.007 0.459 0.019 0.410 0.014
pos-GCN 0.696 0.008 0.301 0.011 0.278 0.024
geo-GCN 0.743 0.004 0.270 0.005 0.299 0.033
Table 3: Performance on three chemical datasets measured with ROC AUC for BBBP and RMSE for ESOL and FreeSolv datasets. Best mean results and intervals overlapping with them are bolded. For the first column higher is better, for the second and the third lower is better.

The results presented in Table 3 show that for FreeSolv dataset our method matches the result of MPNN, which is the best performing model for this task. For the two other datasets, geo-GCN outperforms all tested models by a significant margin. Based on pos-GCN scores, we notice that including positional features consistently improves the performance of the model across all tasks, and for the smallest dataset, FreeSolv, pos-GCN even surpasses the score of MPNN. Nevertheless, learning from bigger datasets requires a better way of managing positional data, which can be noted for ESOL and BBBP datasets for which pos-GCN performs significantly worse than geo-GCN but still better than vanilla GC.

Figure 5: ROC AUC scores measured on the BBBP dataset for different data augmentation strategies. The amount of augmentation increases from left to right.

Ablation study of the data augmentation

We also studied the effect of data augmentation on the geo-GCN performance. First, we examined how removing predicted positions, and thus setting all positional vectors to zero in Equation 3, affects the scores achieved by our model on chemical tasks. The results are depicted in Figure 4. It clearly shows that even predicted node coordinates improve the performance of the method. On the same plot we also show the outcome of augmenting the data with random rotations and 30 predicted molecule conformations, which were calculated as described in the previous subsection. As expected, the best performing model uses all types of position augmentation.

Eventually, the impact of various levels of augmentation was studied. For this purpose we precalculated 20 molecular conformations on the BBBP dataset using the universal force field method and used these predictions to augment the dataset. To test the importance of conformation variety, each run we increased the number of available conformations to sample from. The results are presented in Figure 5. One can see that including a bigger number of conformations helps the model to achieve better results. Also, the curve flattens out after a few conformations, which may be caused by limited flexibility of small compounds and high similarity of the predicted shapes.


We proposed geo-GCN which is a general model for processing graph-structured data with spatial features. Node positions are integrated into our convolution operation to create a layer which generalizes both GCNs and CNNs. In contrast to the majority of other approaches, our method can effectively use added information about location to construct self-taught feature masking, which can be augmented to achieve invariance of desired properties. Furthermore, we provide a theoretical analysis of our geometric graph convolutions. Experiments confirm strong performance of our method.


  • [1] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: Related work.
  • [2] H. Cho, I. Choi, et al. (2018) Three-dimensionally embedded graph convolutional network (3dgcn) for molecule interpretation. arXiv preprint arXiv:1811.09794. Cited by: Related work.
  • [3] M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: Related work, Image graph classification.
  • [4] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: Related work, Learning from molecules.
  • [5] M. Fey, J. Eric Lenssen, F. Weichert, and H. Müller (2018)

    SplineCNN: fast geometric deep learning with continuous b-spline kernels


    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 869–877. Cited by: Image graph classification.
  • [6] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    pp. 1263–1272. Cited by: Related work, Learning from molecules.
  • [7] M. Gori, G. Monfardini, and F. Scarselli (2005) A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2, pp. 729–734. Cited by: Related work.
  • [8] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: Related work, Related work.
  • [9] M. Henaff, J. Bruna, and Y. LeCun (2015) Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163. Cited by: Related work.
  • [10] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley (2016) Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design 30 (8), pp. 595–608. Cited by: Learning from molecules.
  • [11] Y. Kim (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. Cited by: Introduction.
  • [12] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: Introduction, Related work, Related work, Graph convolutions..
  • [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: Introduction.
  • [14] R. Li, S. Wang, F. Zhu, and J. Huang (2018) Adaptive graph convolutional neural networks. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    Cited by: Related work.
  • [15] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: Related work.
  • [16] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5115–5124. Cited by: Related work, Image graph classification, Image graph classification.
  • [17] D. Rogers and M. Hahn (2010) Extended-connectivity fingerprints. Journal of chemical information and modeling 50 (5), pp. 742–754. Cited by: Learning from molecules.
  • [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: Related work.
  • [19] S. S. Seferbekov, V. Iglovikov, A. Buslaev, and A. Shvets (2018) Feature pyramid network for multi-class land segmentation.. In CVPR Workshops, pp. 272–275. Cited by: Introduction.
  • [20] C. Shang, Q. Liu, K. Chen, J. Sun, J. Lu, J. Yi, and J. Bi (2018) Edge attention-based multi-relational graph convolutional networks. arXiv preprint arXiv:1802.04944. Cited by: Related work, Learning from molecules.
  • [21] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: Introduction, Related work, Graph convolutions..
  • [22] F. Wu, T. Zhang, A. H. d. Souza Jr, C. Fifty, T. Yu, and K. Q. Weinberger (2019) Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153. Cited by: Related work.
  • [23] Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande (2018) MoleculeNet: a benchmark for molecular machine learning. Chemical science 9 (2), pp. 513–530. Cited by: Learning from molecules.
  • [24] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2018) How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826. Cited by: Introduction, Related work.
  • [25] J. Yang, M. N. Nguyen, P. P. San, X. L. Li, and S. Krishnaswamy (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In Twenty-Fourth International Joint Conference on Artificial Intelligence, Cited by: Introduction.
  • [26] L. Yang, K. Tang, J. Yang, and L. Li (2017) Dense captioning with joint inference and visual context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2193–2202. Cited by: Introduction.
  • [27] Z. Zhou and X. Li (2017) Convolution on graph: a high-order and adaptive approach. arXiv preprint arXiv:1706.09916. Cited by: Related work.

Appendix A Experimental details

In the following section we list out all hyperparameters ranges used during the random search in our experiments.

In table 4 we present the geo-GCN hyperparameters ranges, that were used in all our experiments.

batch size 16, 32, 64, 128
learning rate 0.01, 0.005, 0.001, 0.0005, 0.0001
model dropout 0.0, 0.1, 0.2, 0.3
layers number 1, 2, 4, 6, 8
model dim 16, 32, 64, 128, 256, 512
model dim 8, 16, 32, 64
use cluster pooling True, False
Table 4: geo-GCN hyperparameter ranges

Chemistry experiments

Below we list the hyperparameters ranges used in the chemistry experiment.

C 0.25, 0.4375, 0.625, 0.8125, 1., 1.1875, 1.375, 1.5625, 1.75, 1.9375, 2.125, 2.3125, 2.5, 2.6875, 2.875, 3.0625, 3.25, 3.4375, 3.625, 3.8125, 4.
gamma 0.0125, 0.021875, 0.03125, 0.040625, 0.05, 0.059375, 0.06875, 0.078125, 0.0875, 0.096875, 0.10625, 0.115625, 0.125, 0.134375, 0.14375, 0.153125, 0.1625, 0.171875, 0.18125, 0.190625, 0.2
Table 5: SVM hyperparameter ranges
estimators number 125, 218, 312, 406, 500, 593, 687, 781, 875, 968, 1062, 1156, 1250, 1343, 1437, 1531, 1625, 1718, 1812, 1906, 2000
Table 6: RF hyperparameter ranges
batch size 64, 128, 256
learning rate 0.002, 0.001, 0.0005
filters number 64, 128, 192, 256
fully connected nodes number 128, 256, 512
Table 7: GC hyperparameter ranges
batch size 16, 32, 64, 128
epochs number 20, 40, 60, 80, 100
learning rate 0.002, 0.001, 0.00075, 0.0005
graph features number 32, 64, 96, 128, 256
pair features number 14
Table 8: Weave hyperparameter ranges
batch size 8, 16, 32, 64
epochs number 25, 50, 75, 100
learning rate 0.002, 0.001, 0.00075, 0.0005
T 1, 2, 3, 4, 5
M 2, 3, 4, 5, 6
Table 9: MPNN hyperparameter ranges
batch size 16, 32, 64, 128, 256, 512
EAGCN structure ’concate’, ’weighted’
epochs number 100, 500, 1000
learning rate 0.01, 0.005, 0.001, 0.0005, 0.0001
dropout 0.0, 0.1, 0.3
weight decay 0.0, 0.001, 0.01, 0.0001
sgc1 1 30, 60
sgc1 2 5, 10, 15, 20, 30
sgc1 3 5, 10, 15, 20, 30
sgc1 4 5, 10, 15, 20, 30
sgc1 5 5, 10, 15, 20, 30
sgc2 1 30, 60
sgc2 2 5, 10, 15, 20, 30
sgc2 3 5, 10, 15, 20, 30
sgc2 4 5, 10, 15, 20, 30
sgc2 5 5, 10, 15, 20, 30
den1 12, 32, 64
den2 12, 32, 64
Table 10: EAGCN hyperparameter ranges

Missing data experiments

Below we list the hyperparameters ranges used in the missing data experiment.

batch size 16, 32, 64, 128
learning rate 0.0001, 0.0005, 0.001, 0.005
layers dimensionality 64, 128, 256, 512
layers number 2, 3, 4, 5
Table 11: Fully Connected Network hyperparameter ranges