Function Space Pooling For Graph Convolutional Networks

05/15/2019 ∙ by Padraig Corcoran, et al. ∙ 0

Convolutional layers in graph neural networks are a fundamental type of layer which output a representation or embedding of each graph vertex. The representation typically encodes information about the vertex in question and its neighbourhood. If one wishes to perform a graph centric task such as graph classification the set of vertex representations must be integrated or pooled to form a graph representation. We propose a novel pooling method which transforms a set of vertex representations into a function space representation. Experiential results demonstrate that the proposed method outperforms standard pooling methods of computing the sum and mean vertex representation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many real world data such as social networks, collections of documents and chemical structures are naturally represented as graphs. Consequently there exists great potential for the application of machine learning to graphs. Given the great successes of neural networks or deep learning to the analysis of images, there has recently been much research considering the application or generalization of neural networks to graphs. In many cases this has resulted in state of the art performance in many tasks

(Wu et al., 2019).

Graph convolutional is a neural network architecture commonly applied to graphs. This architecture consists of a sequence of convolutional layers where each layer iteratively updates a representation or embedding of each vertex. This update is achieved through the application of an operation which considers the current representation of each vertex plus the current representation of its adjacent neighbours (Gilmer et al., 2017). The output of a sequence of convolutional layers is a representation of each vertex which encodes properties of the vertex in question and vertices in its neighbourhood.

If one wishes to perform a vertex centric task such as vertex classification, then one may operate directly on the set of vertex representations output from a sequence of convolutional layers. However, if one wishes to perform a graph centric task such as graph classification, then the set of vertex representations must somehow be integrated or pooled to form a graph representation. Pooling represents a challenging problem because there exists no vertex ordering and different graphs may have a different number of vertices. Commonly employed pooling methods include computing the mean or sum of vertex representations. However these simple pooling methods are not a complete invariant in the sense that many different sets of vertex representations may result in the same graph representation leading to weak discrimination power (Xu et al., 2018). To overcome this issue and increase discrimination power a number of authors have proposed more sophisticated pooling methods. For example Ying et al. (2018)

proposed a pooling method which performs a hierarchical clustering of vertex representations.

In this article we propose a novel pooling method which maps a set of vertex representations to a function space representation which forms a vector space. This method is parameterized by a single learnable parameter which controls the discrimination power of the method. This makes the method applicable to both finer and coarser classification tasks which require greater and less discrimination power respectively. The proposed pooling method is inspired by related methods in the field of applied topology which map sets of points in

to a function space representation (Adams et al., 2017).

The layout of this paper is as follows. Section 2 reviews related works on pooling methods. Section 3 describes the proposed pooling method. Section 4 presents an evaluation of this method. Finally section 5 draws some conclusions from this work.

2 Related Pooling Methods

The simplest and mostly commonly used pooling methods involve computing basic summary statistics such as mean and sum of vertex representations (Duvenaud et al., 2015). To improve discrimination power more sophisticated pooling methods have been proposed. The SortPooling method first sorts the vertices with respect to structural roles in the graph (Wu et al., 2019). The vertex representations corresponding to the first vertices in this order are then used as input to a traditional one dimensional convolutional network. The value is a fixed hyper-parameter in the model. Set2set is a general approach for embedding a set in a manner which is invariant to element order (Vinyals et al., 2015). Gilmer et al. (2017) proposed to use this method to perform pooling. Ying et al. (2018) proposed a pooling method which performs a hierarchical clustering of vertex representations. Kearnes et al. (2016) proposed a pooling method based on fuzzy histograms. This method has similarities to that proposed in this article but is formulated is in terms of fuzzy theory as opposed to function spaces. The method proposed in this article is in turn distinct. All of the above pooling methods are supervised methods. Bai et al. (2019) proposed a pooling method which is unsupervised.

3 Function Space Pooling

Figure 1: A set of vertex representations output from a sequence of convolutional layers is displayed in (a) where each element is represented by a red dot. The result of applying the map to the set is the set displayed in (b). The result of applying the map to with the parameter is the function displayed in (c). The result of applying the map to with the parameter is the function displayed in (d).

Let graph be a graph where and are the corresponding sets of vertices and edges respectively. Let be the set of vertex representations output from a sequence of convolutional layers applied to . We assume that each vertex representation is an element of . The proposed pooling method takes as input and returns a function. That is, the method is a map from the space of sets to the space of functions. It contains two steps which we now describe in turn.

The set of vertex representations is an object in the category of sets which we denote . Let be the

-dimensional Sigmoid function defined in Equation

1 where is the -dimensional interval. In the first step of the proposed pooling method we apply the -dimensional Sigmoid elementwise to to give a map . To illustrate this map consider Figure 1 which displays an example set containing three elements in . The result of applying the map to this set is illustrated in Figure 1.

(1)

Let

be a probability distribution. For the purposes of this work we used the

-dimensional Gaussian distribution defined in Equation

2 with mean

and variance

.

(2)

In the second step of the proposed pooling method we apply a map to . Here is the vector space of functions where the -th power of the absolute value is Lebesgue integrable and is equipped with the -norm defined in Equation 3 (Christensen, 2010). Note that, function addition and subtraction is performed pointwise.

(3)

The function resulting from the map is defined in Definition 1. To illustrate this map consider again the example set illustrated in Figure 1. Figure 1 displays the function resulting from applying the map to this set with a parameter value of .

Definition 1.

For the corresponding function representation is defined in Equation 4

(4)

The elements of , and in turn the function representation , are infinite dimensional vector spaces. That is, there are an infinite number of elements in the domain of . We approximate this function as a finite dimensional vector space by discretizing the function domain using a regular grid of elements. For example, the image in Figure 1 corresponds to a discretizing of the function domain using a grid.

The proposed pooling method is parameterized by in the probability distribution of Equation 2 where this parameter is in the range . As the value of approaches the probability distribution approaches an indicator function on the domain . On the other hand, as the value of approaches the probability distribution approaches a uniform function on the domain . For example, Figures 1 and 1 display the functions resulting from applying the map to the set in Figure 1 with parameter values of and respectively.

The parameter may be interpreted in a couple of ways. As the value of approaches the function representation approaches a complete invariant. That is, distinct sets map to distinct functions where the distance between these functions as defined by the norm in Equation 3 is greater than zero. On the other hand, as approaches , the distance between these functions reduces. An alternative way of interpreting the parameter is as follows. As mentioned above, as the value of approaches the probability distribution approaches a uniform function on the domain . In this case the proposed pooling method differs from computing the sum of the elements of by a multiplicative constant only.

4 Results

To evaluate the performance of the proposed pooling method we considered the task of graph classification. The layout of this section is as follows. Section 4.1 describes the datasets considered. Section 4.2 describes the feed-forward network architecture used in all experiments. Section 4.3 describes the optimization method used to optimize the network parameters. Finally section 4.4 presents the classification accuracy achieved by the proposed pooling method relative to two benchmark methods.

4.1 Datasets

The first dataset consider was the MUTAG dataset which consists of 188 chemical compounds where the classification problem is binary and concerns predicting if a chemical compound has mutagenicity or not (Debnath et al., 1991). Each chemical compound is represented as a graph where there are distinct types of vertices.

The second dataset consider was the PROTEINS dataset which consists of proteins where the classification task is binary and concerns predicting if a protein is an enzyme or not (Borgwardt et al., 2005). Each protein is represented as a graph where there are distinct types of vertices. Both of the datasets considered are commonly used to evaluate graph neural networks (Fey and Lenssen, 2019).

4.2 Network Architecture

The feed-forward network architecture used consists of the following six layers. The first two layers are convolutional layers. A number of studies have found that two convolutional layers empirically gives best performance (Kipf and Welling, 2016). The third layer is a fully connected linear layer. The fourth layer is the pooling method used. The fifth layer is another fully connected linear layer. The final layer is a softmax function.

The convolutional layers used are similarly to the GraphSAGE convolutional layers (Hamilton et al., 2017). Let denote the matrix containing the vertex representation in the th convolutional layer. Each matrix row corresponds to the representation of an individual vertex. Let denote matrix multiplication and CONCAT denote horizontal matrix concatenation. The th convolutional layer is implemented using Equation 5 where and are the corresponding weights and biases respectively. The weights is a matrix of dimension where the dimension of the th layer. The biases is a vector of dimension . The dimension of the input layer

is equal to the number of vertex types since one-hot encoding was used. The dimensions of the two convolutional layers

and were both set to .

(5)

The dimensions of the first and second linear layers were set to and respectively. The output of the first linear layer is the input to the pooling method. Therefore the multi-dimensional interval corresponding to the domain of the function in Definition 1 is of dimension . We approximate this function as a finite dimensional vector space by discretizing the function domain using a regular grid with elements in each dimension. This gives a finite dimensional vector space of size .

4.3 Optimization

The model parameters to be optimized in the architecture of section 4.2 are the weights and biases of the convolutional and linear layers plus the parameter

of the pooling method. For loss function a Cross Entropy loss term plus an

regularization term was used. In all experiments a regularization weight of was used. The Adam optimization algorithm was used to optimize all model parameters with a learning rate of . In all experiments optimization was performed using epochs.

Pooling Method MUTAG PROTEINS
Sum 65.6 13 60.1 18
Mean 78.1 18 57.7 16
Function Space 83.3 11 72.8 19
Table 1: For each of the MUTAG and PROTEINS datasets, the mean classification accuracy of 10-fold cross validation for each pooling method are displayed.

4.4 Classification Accuracy

The proposed pooling method was benchmarked against the methods of computing the mean and sum of vertex representations. As discussed in section 2, these are some of the most commonly used pooling methods. For each benchmark pooling method the corresponding network architecture was identical to that described in section 4.2 with the exception that the pooling layer was replaced and the dimension of the linear layer before this layer was changed from to . For both datasets considered we computed the mean accuracy of 10-fold cross validation for each pooling method and the results of this analysis are displayed in Table 1. For both datasets, the proposed pooling method outperformed both benchmark methods.

5 Conclusions

We propose a novel pooling method for convolutional layers in graph neural networks which involves computing a function space representation of vertex representations. Experimental results demonstrate the proposed method outperforms the commonly employed pooling methods of computing the mean and sum of vertex representations.

References