Log In Sign Up

Graph Convolution: A High-Order and Adaptive Approach

by   Zhenpeng Zhou, et al.
Stanford University

In this paper, we presented a novel convolutional neural network framework for graph modeling, with the introduction of two new modules specially designed for graph-structured data: the k-th order convolution operator and the adaptive filtering module. Importantly, our framework of High-order and Adaptive Graph Convolutional Network (HA-GCN) is a general-purposed architecture that fits various applications on both node and graph centrics, as well as graph generative models. We conducted extensive experiments on demonstrating the advantages of our framework. Particularly, our HA-GCN outperforms the state-of-the-art models on node classification and molecule property prediction tasks. It also generates 32 molecule generation task, both of which will significantly benefit real-world applications such as material design and drug screening.


Cross-GCN: Enhancing Graph Convolutional Network with k-Order Feature Interactions

Graph Convolutional Network (GCN) is an emerging technique that performs...

BiTe-GCN: A New GCN Architecture via BidirectionalConvolution of Topology and Features on Text-Rich Networks

Graph convolutional networks (GCNs), aiming to integrate high-order neig...

Label Aware Graph Convolutional Network -- Not All Edges Deserve Your Attention

Graph classification is practically important in many domains. To solve ...

Geometric Graph Convolutional Neural Networks

Graph Convolutional Networks (GCNs) have recently become the primary cho...

End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement

The automatic generation of floorplans given user inputs has great poten...

BScNets: Block Simplicial Complex Neural Networks

Simplicial neural networks (SNN) have recently emerged as the newest dir...

Structural Deep Clustering Network

Clustering is a fundamental task in data analysis. Recently, deep cluste...

1 Introduction

Convolutional neural networks (CNNs) have achieved great success in various tasks from computer vision

[Huang:2016wa], speech recognition [Zhang:2017up]

and natural language processing

[Conneau:2017to]. CNN provides us an efficient and effective architecture to learn meaningful representations for graphics and texts. In recent years, researchers thrive to extend the operator of convolution and develop CNN architectures for graphs, which possibly have more complicated structures than images. The graph convolutional networks are usually applied on the following two centrics of learning tasks:

  • Node centric: the prediction tasks related to the nodes in a graph. The graph convolutional networks usually do so via outputting a feature vector for each node in the graph, which meaningfully reflects the node’s property and neighborhood structure. For example, in social networks, the vectors can be used for tasks like node classification and link prediction. Sometimes, this is related to node representation learning.

  • Graph centric: the prediction tasks related to the graphs. For example, in the context of chemistry, a molecule can be viewed as a graph with atoms as nodes and bonds as edges. graph convolutional networks are constructed to encode the molecules meaningfully in terms of their physical and chemical properties. These tasks are therefore the key to many real-world applications such as material design and drug screening. In this context, graph convolutional networks usually find a way to encode the graph and use the encodings for graph prediction tasks.

Early efforts on designing neural networks for graphs date back to the works of Gori et al. and Scarselli et al. [gori2005new, scarselli2009graph], in which they built sequential or recurrent network architectures for graph-structured data. The study of Bruna et al. [bruna2013spectral], Edwards et al. [Edwards:2016vy] and Defferrard et al. [defferrard2016convolutional] further developed the idea of spectral filtering/convolution which operates on the graph spectrum. Henaff et al. [henaff2015deep]

extended the graph convolutional networks to large scale datasets like ImageNet Object Recognition, text categorization, and bioinformatics. Meanwhile, Niepert et al.

[niepert2016learning] proposed an approach of PATCHY-SAN, which defined operations of node sequence selection, neighborhood assembly, and graph normalization. Atwood et al. [Atwood:2016wq] presented diffusion-convolutional neural networks (DCNNs) model for graph-structured data. As we will show later, these models successfully made CNN work under the graph settings, but they still lack of careful considerations for the specialties of the graph structures in the network design.

Also there are several newly-published results on conducting graph convolutions dynamically. Jia et al. [Jia:2016wt] proposed the Dynamic Filter Network, where filters are generated dynamically conditioned on the input features. Simonovsky et al. [Simonovsky:2017tv] extended that idea to graphs, using edge-conditioned dynamic weights for graph convolutions. The work of Verma et al. [Verma:2017tb] managed to determine the shape of filters as a function of the features in previous network layers. Manessi et al. [Manessi:2017wp] proposed a model to learn temporal information from graphs that have a changing structure overtime. Li et al. [Li:2017ud] proposed a general and flexible graph convolutional network (EGCN) to deal with data with diverse or undefined dimensions. All these research found one way or another to dynamically utilize the graph data, and our work can be seen as a further endeavor with the introduction of the adaptive convolution module.

Besides, a great amount of research on graph-centric tasks concentrates on the application of molecule fingerprints. The molecule fingerprinting refers to a quantitative encoding for the molecules that can be used for molecule property summarization and prediction. Prior to the usage of CNN, the graph kernels have dominated many learning and prediction tasks for (molecule) graphs [kondor2002diffusion, shervashidze2009efficient, shervashidze2011weisfeiler]. The paper of Duvenaud et al. [duvenaud2015convolutional] first introduced CNN for encoding the molecules and Kearnes et al. [kearnes2016molecular] further improved the results. Gilmer et al. [Gilmer:2017tl] defined a Message Passing Neural Networks (MPNNs) for molecules to reformulate existing models into a single common framework with a message passing interpretation. It will be shown later that our work greatly improved the state-of-the-art performance on molecule-relevant tasks with a network architecture that better captures the properties of the molecules graphs.

In this work, we proposed a novel graph convolutional network architecture named High-order and Adaptive Graph Convolutional Network (HA-GCN). The most related work to ours is the graph convolutional network (GCN) [kipf2016semi], in which the convolution operator only reaches one-hop neighbors. Our high-order operator provides an efficient design of convolution that reaches -hop neighbors. Furthermore, we introduced an adaptive filtering module that adjusts the weights of convolution operators dynamically based on the local graph connections and node features. Compared with the work of Li et al. [li2015gated] which introduced the modern idea of LSTM into graph settings, our adaptive module can be interpreted as a graph realization of the attention mechanism proposed in [xu2015show]. Most importantly, unlike the previous graph networks designed for either node-centric or graph-centric task, our HA-GCN framework is general-purposed and capable of fulfilling both. Additionally, we constructed a graph generative model with HA-GCN for the task of molecule generation, achieving a significant improvement over the state-of-the-art model.

Our contribution is two-fold:

  • We introduced two new modules for graph-structured data and built a novel graph convolutional network framework of HA-GCN.

  • We developed a general-purposed architecture that can be applied for node-centric prediction, graph-centric prediction and graph generative modeling. Our architecture achieved state-of-the-art performance uniformly on all the tasks.

The rest of the paper is organized as follows: first we provide some preliminaries for the graph model and a brief discussion of several frameworks of graph convolutional networks. Then we introduce the key ideas of high-order convolution operator and adaptive filtering module. Furthermore, we present our framework of HA-GCN and several experiments to demonstrate its performance. Finally, we summarize the scope of HA-GCN applications and point out the potential future directions.

2 Preliminaries

2.1 The Graph Model111In this paper, we use the terminology “graph” to refer to the graph/network structure of data and “network” for the architecture of machine learning models.

In this subsection, we provide the preliminaries and notations for the graph model. A graph is denoted as a pair with the set of nodes (vertices) and the set of edges. Here we do not distinguish the undirected and directed graphs in terms of notations since our framework works for both cases. Each graph can be represented by a -by- adjacency matrix where if there is an edge from to and otherwise. Based on the adjacency matrix, we can have a distance function to represent the graph distance from to (the minimum length of paths connecting and ). Additionally, we assume that each node is associated with a feature vector , and compactly we use to denote the feature matrix.

2.2 Graph Convolutional Networks (GCNs)

In this subsection, we briefly review several GCN structures from previous works to provide some intuitions for the design of convolution on graph. At the first place, the convolution operator at a specific node in graph can be generally expressed as

Here is the input feature for node , is the bias term and is the weight which can be non-stationary and vary with respect to . The set defines the scope of convolution. For traditional applications, the CNN architecture is usually designed for a low-dimensional grid with the same connection pattern for every node. For example, images can be viewed as two-dimensional grids (for each of the RGB channels, or gray scale channel), and the underlying graph is formed by connecting adjacent pixels. Then can be simply defined as a fixed-size block or window around pixel .

In the more general graph settings, one can define as the set of nodes that are adjacent to . For example, the core of the fingerprint (FP) convolution operator in the work of Duvenaud et al. [duvenaud2015convolutional] is to compute the average over neighbors, i.e. for all . With the help of adjacency matrix we can write the operator as


The multiplication of and the feature matrix results in a feature averaging over neighbor nodes. One step further, the paper of node-GCN [kipf2016semi]

applied linear weighting and non-linear transformation in addition to the averaging:


The weight matrix and the function perform a linear and non-linear transformation on the feature respectively.

The papers of Bruna et al.[bruna2013spectral] and Defferrard et al. [defferrard2016convolutional] took a different approach by conducting convolution on the spectrum of a graph Laplacian. Let be the graph Laplacian and its orthogonal decomposition (where

is a orthogonal matrix and

is a diagonal matrix). Instead of appending the weight matrix as in (2), the spectral convolution considers a parameterized convolution operator on Precisely,


Here the function is a polynomial function which is element-wisely applied on the diagonal matrix

When discussing the advantage of spectral convolution, the authors mentioned that a -order polynomial choice of is exactly -localized on graph, which means the convolution reaches as far as the -hop neighbors. Compared to the one-hop neighbor averaging in (1) and (2), this allows faster information propagation over the graph. However, the the choice of polynomial does not give an exact convolution operator for -hop neighbors, as not all neighbors are equally weighted as assumed in convolution, with the fact in mind that . This motivates the proposal of our high-order convolution operator. Another problem with all of those convolution operators is that they are using fixed convolution weights, which are invariant across graphs. Therefore it can hardly capture the differences between the locations where the convolution operation happens. This motivates the design of our adaptive module, which successfully takes both the local features and the graph structures into account.

3 High-Order and Adaptive Graph Convolutional Network (HA-GCN)

3.1 -th Order Graph Convolution

We begin with the definition of the -hop (-th order) neighborhood: for node . In fact, the exact -hop connectivity can be obtained by the multiplication of the adjacency matrix , as formally stated in the following proposition.

Proposition 1.

Let be the adjacency matrix of a graph , then the entry of its -th product is the number of -hop paths from to .

With this proposition, we can define a -th order convolution operator as follows




Here and refer to element-wise matrix product and minimum respectively. The is the weight matrix while is the bias matrix. The is obtained by clipping to

. The addition of identity matrix

to creates a self loop for each node in the graph. And the clipping is motivated by the fact that if the matrix of have elements larger than one, clipping those values to will exactly lead to the convolution of -hop neighborhood. The input of the operator is the adjacency matrix and feature matrix . Its output has the same dimension as . As the name suggests, the convolution takes the feature vectors of a node’s -hop neighbors as input and outputs the weighted average of them.

The operator in (4) elegantly implements our idea of -th order convolution on a graph, which is the convolution with kernel size of in conventional terminologies of CNN. On one hand, it can be viewed as an efficient high-order generalization of the first-order graph convolution in (2). On the other hand, this operator is closely related to the graph spectral convolution in (3), as the -th order polynomial on the graph spectrum can also be regarded as an operation within the scope of -hop neighborhood .

3.2 Adaptive Filtering Module

Based on the operator (4), we now introduce an adaptive filtering module for graph convolution. It filters the convolution weights according to the the features and the neighborhood connection of a specific node. Take the molecule graph in chemistry for example, benzene rings are more important than alkyl chains when predicting the properties of molecules. As a result, we desire larger convolution weights for neighborhood atoms on the benzene rings than alkyl chains. Without the adaptive module, graph convolutions are spatially invariant and fails to work as desired. The introduction of adaptive filters will allow the network to find the convolution target adaptively and to better capture the locality disparities.

The idea of the adaptive filtering comes from the attention mechanism [xu2015show]

, which chose the interest pixels adaptively while generating the corresponding words in the output sequence. It can also be viewed as a variant of the gates that optionally let information through in Long Short-Term Memory (LSTM) network

[hochreiter1997long]. Technically, our adaptive filter is a nonlinear operator on the weight matrix , i.e.


where denotes element-wise matrix product. In fact, the operator is determined together by and , reflecting both node features and graph connections,

We consider two candidates for the function :




Here and hereafter, refers to the matrix concatenation. The first operator considers the interaction of node features and graph connections via an inner product for and while the second one does so via linear transformation. In practice, we find that the linear adaptive filter (8) achieves a better performance than the product one (7

) on almost all tasks. Therefore, we will adopt and report the performance based on the linear one in the experiment section. The adaptive filters are designed for a weighted selection of nodes, therefore a sigmoid non-linearity is applied to binarize its values. The parameter matrix

will align the output dimension of to be the same with matrix . Unlike the existing design of dynamic filters which generate the weights solely from node or edge features, our adaptive filtering module provides a more thorough consideration by taking both node features and graph connections into account.

3.3 The Framework of HA-GCN

In this subsection, we present the framework of HA-GCN and demonstrate how it can be applied to various tasks. By adding the adaptive module (6) into the high-order convolution operator (4), we define the HA operator:

a) b)
Figure 1: The illustration of (a) high-order convolution and adaptive filtering module and (b) the whole graph convolutional network.

Figure 1 gives a visualization of the operators and the framework HA-GCN. Figure 1(a) illustrates the operator for a single node with : the bottom layer of adaptive filter applies to weight matrices and to obtain the adaptive weights and (illustrated by the orange and green lines); the second layer brings the adaptive weights and the corresponding adjacency matrix together for convolution. Figure 1(b) emphasizes the fact that the convolution is operated on each node in the graph, with a layer-by-layer manner. It is important to notice that the high-order operator and adaptive filtering module (HA operators) can be used together with other neural network architectures/operations like fully-connected layers, pooling layers, and non-linear transformations. In this paper, we name the graph convolutional network architecture built with our HA operator as HA-GCN.

After all layers of convolution, the features from different orders of convolution are concatenated together:

The framework of HA-GCN takes a feature matrix of ( is the number of nodes in the network and is the dimension of node features) and outputs a matrix of shape , resulting in an increase of the feature dimension by a factor of . Now, we elaborate more on how to apply HA-GCN on various tasks.

Node-centric prediction: After the graph convolutions in HA-GCN, each node is associated with a feature vector. The feature vectors can be used for tasks of node-centric classification or regression. It is also closely related to the graph (network) representation learning [perozzi2014deepwalk, grover2016node2vec], which refers to the procedure of learning a feature vector for each individual in a complicated system. Under node-centric settings, it means to learn a vector for each node in the graph that meaningfully reflects the local graph structure around that node. Our HA-GCN also outputs a vector for each node in the graph. In this sense, HA-GCN can be viewed as a supervised graph representation learning framework.

Graph-centric prediction:

To handle graphs of different sizes, the input adjacency matrix and feature matrix are padded with zero on the bottom right. Here we point out a subtle difference between node-centric and graph-centric tasks: Under node-centric settings, the dataset is a single network with part of the nodes’ label/value used as training set and the others as validation and test set, while under graph-centric settings, the dataset is a set of graphs (possibly of different sizes), divided into training/validation/test set. The HA-GCN works for both cases and the number of parameters in HA convolutional layer is

with being the size of the graph (or the maximum size of the graphs). As will be demonstrated later in the experiment section, HA-GCN is more prone to over-fitting under node-centric settings than graph-centric settings.

Graph generative modeling: The task of graph generative modeling refers to the learning of a probabilistic model from a set of graphs , with which we can sample graphs that are unseen before but still have similar structures with the graphs in . With the adventure of variational auto-encoder [kingma2013auto] and adversarial auto-encoder [makhzani2015adversarial], graph convolutional networks can be made suitable for the task of generative modeling in addition to discriminative modeling.

An auto-encoder always consists of two parts: an encoder and a decoder. The encoder maps the input data to an encoding vector and the decoder maps from back to . We call the encoding space

latent or hidden space. To make it a generative model, we usually assume a probabilistic distribution (for example a Gaussian distribution) over the latent space. Here we consider the usage of HA-GCN as encoder for graph generative modeling. Given the length of the paper, we skip the technical discussion of the auto-encoder model here and defer more details about the HA-GCN auto-encoder architecture to the experiment section. As an application, the graph generative models allow us to create a continuous representation of molecules and generate new chemical structures by searching the latent space, which can be used to guide the process of material design or drug screening.

4 Experiments

4.1 Node-centric learning

First, we considered a node-centric task of supervised document classification in citation graphs. The datasets [Sen:2008wi] have three citation graphs, where each graph contains bag-of-words feature vectors for every document and a list of citation links between documents. We treated the citation links as (undirected) edges and construct a binary and symmetric adjacency matrix . Each document has a class label and the goal is to predict the class label from the document feature and the citation graph. The statistics of the datasets are as reported in [Sen:2008wi].

Dataset Nodes Edges Classes Features
Citeseer 3,327 4,732 6 3,703
Cora 2,708 5,429 7 1,433
Pubmed 19,717 44,338 210 5,414

Training and Architecture: We used the same GCN network structure of Kipf et al. [kipf2016semi], except a replacement of their first-order graph convolutional layer with our HA layer. Here and hereafter, we use gcn_{} to denote graph convolutional layer up with order . fc refers fully connected layer with hidden units.

Name Architectures GCN gcn_{1}-fc128-gcn_{1}-fc1-softmax gcn_{1, 2} gcn{1,2}-fc128-gcn{1,2}-fc1-softmax adp_gcn_{1, 2} adp_gcn{1,2}-fc128-adp_gcn{1,2}-fc1-softmax
Method Citeseer Cora Pubmed
l1_logistic 0.653 0.701 0.693
l2_logistic 0.672 0.724 0.685
DeepWalk 0.631 0.746 0.712
Planetoid 0.724 0.832 0.844
GCN 0.776 0.889 0.839
gcn_{1,2} 0.788 0.901 0.851
adp_gcn_{1,2} 0.765 0.862 0.840
Table 1: The accuracy of node classification results. l1_logistic and l2_logistic stand for and

regularized logistic regression, DeepWalk refers to the algorithm by Perozzi et al.

[perozzi2014deepwalk], Planetoid refers to the algorithm by Yang et al. [Yang:2016ts], and GCN refers to the graph convolutional neural network by Kipf et al. [kipf2016semi]. All the models are implemented with the open-source code on github.

To compare the performance of different models, we randomly divided the dataset into training/validation/test sets with a ratio of and reported the prediction accuracy on test set in Table 1. The hyper-parameters are (dropout rate), (L2 regularization), and (hidden units). From the perspective of node representation learning, the first three models are unsupervised but the last four are (semi-)supervised. This explains why the later ones have better performance. With our second-order HA graph convolution, the information from -hop neighbors can be utilized, resulting in an approximately % increment of accuracy. Also, the adaptive module fails to further improve the accuracy. This is because the adaptive filter is designed to generate different filter weights for different graphs. However, each node-centric task has only one graph, whose convolution weights can be learned directly. Therefore, the adaptive module becomes redundant in this node-centric setting.

4.2 Graph-centric learning

In this experiment, we demonstrated the performance of HA-GCN on prediction tasks for molecule graphs. The goal is to predict the molecule’s properties based on a molecule graph. We used the same datasets as described in Duvenaud et al. [duvenaud2015convolutional] and evaluate the following three properties:

  • Solubility: The aqueous solubility of 1144 molecules by [delaney2004esol].

  • Drug efficacy: The half-maximal effective concentration (EC50) in vitro of 10,000 molecules against a sulfide-resistant strain of P. falciparum, the parasite that causes malaria, as measured by [gamo2010thousands].

  • Organic photovoltaic efficiency: The Harvard Clean Energy Project [hachmann2011harvard]

    uses expensive DFT simulations to estimate the photovoltaic efficiency of 30,000 organic molecules.

With the same process described in Duvenaud et al. [duvenaud2015convolutional], we first used RDKit [landrum2006rdkit] to convert the SMILE [weininger1988smiles] representation of molecules into graphs, which treats hydrogen atoms implicitly. Each node in the graph corresponds to an atom and is appended with a

-dimensional initial feature vector. The features concatenate a one-hot encoding of the atoms element, its degree, the number of attached hydrogen atoms, and the implicit valence, and an aromaticity indicator.

Training and Architecture: The following network architectures are used for comparison. l1_gcn and l2_gcn refer to convolutional networks with and graph convolutional layer(s), respectively. To compare the performance of different models, we reported the root of mean square errors (RMSEs) in Table 2.

Name Architectures


l1_adp_gcn adp_gcn{1,2,3}-ReLU-fc64-ReLU-fc16-ReLU-fc1
l2_gcn [gcn_{1,2,3}-ReLU]*2-fc64-ReLU-fc16-ReLU-fc1
l2_adp_gcn [adp_gcn_{1,2,3}-ReLU]*2-fc64-ReLU-fc16-ReLU-fc1
Model Dataset
Solubility Drug Photovoltaic
efficacy efficiency
NFP 0.52 1.16 1.43
MGC 0.46 1.07 1.10
node-GCN 0.54 1.14 1.45
l1_gcn 0.61 1.20 1.54
l1_adp_gcn 0.50 1.17 1.24
l2_gcn 0.56 1.09 1.35
l2_adp_gcn 0.38 1.07 1.08

Table 2: Prediction RMSEs: NFP refers to neural fingerprint [duvenaud2015convolutional] and MGC refers to molecular graph convolution [kearnes2016molecular]. Their performances are taken from the original papers. The node-GCN refers to the graph convolutional network [kipf2016semi] and is implemented with the open-source code provided by the authors.

The model node-GCN is indeed a first-order HA-GCN without adaptive filtering module. From the comparison between node-GCN, l1_gcn and l2_gcn, we can see the effectiveness of our high-order convolution operator. Also, the networks with adaptive modules have a uniformly better performance than their counterparts without the module, which demonstrates its advantage.

4.3 Graph Generative Modeling

In this experiment, we considered the task of graph generative modeling with HA-GCN auto-encoder. The network architectures are

Name Architectures gcn_encoder gcn_{1,2,3}-ReLU-fc64-ReLU-fc16 gcn_decoder fc16-fc64-dconv-ReLU

where gcn_{} and fc are defined as before, and dconv is defined as .

We implemented HA-GCN as the encoder for both variational auto-encoder (VAE) and adversarial auto-encoder (AAE). As stated before, the graph generative models can be used to guide the molecule synthesis and the model performance is evaluated based on the proportion of valid molecules in all the newly-sampled molecules. We compared our HA-GCN generative model with the state-of-the-art RNN model of Grammar Variational Autoencoder (RNN-GVAE)

[Kusner:2017tv] We closely followed the experiment setup as in the graph variational autoencoder by Kipf et al. [Kipf:2016ul], with a training data of SMILES molecules [weininger1988smiles] extracted randomly from the ZINC database by Gómez-Bombarelli et al. [GomezBombarelli:2016vk]. Then

encodings were drawn from a normal distribution in the latent space, and decoded to generate molecules. The HA-GCN-AAE model got

% valid molecules, and HA-GCN-VAE got %, while the RNN-GVAE got %. Here we achieved a significant gain in performance with HA-GCN as the encoder for graph generative modeling.

5 Visualization of the HA-GCN

5.1 Visualization of the Convolution Weights

Figure 2: Visualization of the convolution weights.

We visualized the convolution weights in (4) in the HA-GCN trained for the task of photovoltaic efficiency prediction. The convolutional weight matrices are plotted as in Figure 2 and the darkness of a block corresponds to the weight value of corresponding node. We have following observations from the plots: First, it is easy to see that the weight matrices have symmetrical patterns, which is due to the symmetry of adjacency matrix (or ). Second, as the order of convolution increases, the weights on the central nodes increase as well. Our explanation is that as the order of convolution increases, there are more nodes in the reception field. The weight increments on the central nodes is to balance off the effect of having more nodes within the scope of convolution. Third, for weights of convolution orders larger than , we observe many off-diagonal blocks having large values (with dark blue color), showing the necessity of introducing the high-order convolution.

Figure 3: Visualization of the filter weights. The atoms highlighted with red is the randomly selected central node for convolution, the blue color on the atoms indicate the filter weights, with darker blue meaning larger weight.

5.2 Visualization of Adaptive Filters

The adaptive filters in (6) learned from graph connections and node features are visualized in Figure 3. The atoms highlighted in red are the randomly selected central nodes for convolution, and the blue circles on atoms indicate the filter weights, with darker blue meaning larger weight. We have following observations: First, the adaptive filter weights are almost binarized, which means that the filters are capable of selecting nodes for convolution adaptively based on the features and connectivity. Second, for almost all molecules in Figure 3, the atoms being selected are atoms on aromatic rings, which agrees with the chemical intuition that aromatic rings are more important than alkyl chains in terms of predicting organic photovoltaic efficiency. Another interesting observation is that the adaptive filter automatically learned the ortho-para rule in chemistry, which states that for the benzene ring, the functional groups on the opposite of (ortho) and next to (para) a specific atom has a greater influence on the properties of that atom than the functional groups on other sites. For example, in the molecule on Figure 3 row 2 column 1, the weights on the atoms which are opposite of and next to the central atom are selected against other atoms on the six-member ring.

6 Conclusion

In this work, we developed a graph convolutional network architecture of HA-GCN with two new convolution modules specially designed for graph-structured data. With experiments showing the effectiveness of those modules, we strongly advocate a consideration of them in all the graph convolutional network architecture design. For future works, on one hand, we believe that it still deserves more work on designing the convolution network with a careful thought on the underlying graph (global and local) structure. On the other hand, we are currently conducting experiments on automatic chemical design to further demonstrate the practical value of our framework.