Spectral-based Graph Convolutional Network for Directed Graphs

07/21/2019 ∙ by Yi Ma, et al. ∙ Tianjin University Tencent 0

Graph convolutional networks(GCNs) have become the most popular approaches for graph data in these days because of their powerful ability to extract features from graph. GCNs approaches are divided into two categories, spectral-based and spatial-based. As the earliest convolutional networks for graph data, spectral-based GCNs have achieved impressive results in many graph related analytics tasks. However, spectral-based models cannot directly work on directed graphs. In this paper, we propose an improved spectral-based GCN for the directed graph by leveraging redefined Laplacians to improve its propagation model. Our approach can work directly on directed graph data in semi-supervised nodes classification tasks. Experiments on a number of directed graph datasets demonstrate that our approach outperforms the state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, deep learning has achieved great success in many kinds of fields such as image classification, video processing and speech recognition. The data in these tasks is usually represented in the Euclidean space. However, there are many applications where data is generated from the non-Euclidean domain and is represented as graphs. This kind of data is known as graph data. A graph data structure consists of a finite set of vertices (also called nodes), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges. Using the information of graph data, we can capture the interdependence among instances (nodes), such as citationship in papers network, friendship in social network and interactions in molecule network. For instance, in a papers citation network, papers are linked to each other via citationship and the papers can be classified into different areas. The graph data is very complex because of its irregularity. The complexity of graph data results that some important operations of deep learning are not applicable to non-Euclidean domain. For example, convolutional neural networks(CNNs) cannot use a convolution kernel of the same size to convolve graph data of such complex structure.

To handle the complexity of graph data, there have been many studies to design new models for graph data inspired by convolution networks, recurrent networks, and deep autoencoders. These models which incorporate neural architectures are known as graph neural networks. Graph neural networks are categorized into graph convolution networks, graph attention networks

velivckovic2017graph zhang2018gaan , graph autoencodersKipf2016Variational wang2017mgae , graph generative networksde2018molgan li2018learning and graph spatial-temporal networksli2017diffusion according to Wu et al. wu2019comprehensive . In these graph neural networks, graph convolution networks(GCNs) are the most important ones, which are the fundamental of other graph neural network models. One of the earliest work on GCNs is presented in Bruna et al. (2013), which develops a variant of graph convolution bruna2013spectral . From then on, there have been many works to improve graph convolutional networks kipf2016semi defferrard2016convolutional henaff2015deep li2018adaptive levie2017cayleynets . These GCNs approaches fall into two categories. One category of the GCNs approaches is spatial-based. These approaches directly perform the convolution in the graph domain by aggregating information of the neighbor nodes. The other category of the GCNs approaches is spectral-based. These approaches propose a variant of graph convolution methods based on spectral graph theory from the perspective of graph signal processing. Although spectral-based methods have more computational cost than spatial-based ones, they have more powerful ability to extract features from graph data.

As the earliest convolutional networks for graph data, spectral-based models have achieved impressive results in many graph related analytics tasks. However, spectral-based models are limited to work only on undirected graphs kipf2016semi . So the only way to apply spectral-based models to directed graphs is to relax directed graphs to undirected ones, which would be unable to represent the actual structure of directed graphs. Some of the researchers combine the recurrent model and spectral-based GCN to process the temporal directed graphs pareja2019evolvegcn , but they don’t focus on the GCN’s own structure. To the best of our knowledge, we are the first to make improvement of the spectral-based GCN layer’s propagation model to make it adapted to directed graphs.

In this paper, we use a definition of the Laplacian matrix on directed graphs chung2005laplacians to derive the propagation model’s mathematical representation. We use feature decomposition and Chebyshev polynomials to approximate the representation of directed Laplacian matrix to get our propagation model. Then we use this propagation model to design our spectral-based GCNs for directed graphs. Our approach can work well on different directed graph datasets in semi-supervised nodes classification tasks and achieves better performance than the state-of-the-art spectral-based and spatial-based GCN methods.

The remainder of this paper is organized as follows: Section 2 introduces the theoretical motivation of the classic spectral-based GCNs; Section 3 demonstrates the mathematical representation of Laplacians for directed graph and the models we construct in our methods; Section 4 demonstrates the details of our experiments on semi-supervised classification tasks; Concluding discussions and remarks are provided in Section 5 and Section 6.

2 Preliminaries

Spectral-based GCNs are based on Laplacian matrix. For an undirected graph, suppose is the adjacency matrix of the graph, is a diagonal matrix of node degrees, . A graph Laplacian matirx is defined as . The normalized format of Laplacian matrix is defined as , which is a matrix representation of a graph in the graph theory and can be used to find many useful properties of a graph. is symmetric and positive-semidefinite. With these properties, the normalized Laplacian matrix can be factored as , where

is the matrix of eigenvectors ordered by eigenvalues and

is the diagonal matrix of eigenvalues.

Spectral Graph Convolutions

The spectral graph convolution operation is defined in the Fourier domain by computing the eigendecomposition of the graph Laplacian.

is the feature vector of graph’s nodes. The graph Fourier transform to

is defined as . The Fourier transform projects of input graph into the orthogonal space, which is equivalent to representing the arbitrary feature vector defined on the graph as a linear combination of the eigenvectors of the Laplacian matrix. The inverse graph Fourier transform is defined as , where is the output obtained by through the graph Fourier transform. Applying Convolution Theorem wiki:xxx to the graph Fourier transform, the spectral convolutions on graphs are defined as the multiplication of with a filter in the Fourier domain:

(1)

where represents convolution operation and represents the Hadamard product. For two matrices and of the same dimension , the Hadamard product is a matrix of the same dimension as the operands, with elements given by. By defining filter as , Equation 1 can be simplified as

(2)

Here we can understand as a function of the eigenvalues of , i.e. .

Chebyshev Spectral GCN

As we can see, multiplication with the eigenvector matrix from Equation 2 is computationally expensive. To solve this problem, Defferrard et al. defferrard2016convolutional propose ChebNet which uses Chebyshev polynomials of the diagonal matrix of eigenvalues to approximate . ChebNet parametrizes to be a order polynomial of :

(3)

where and denotes the largest eigenvalue of . The Chebyshev polynomials are defined recursively by with and . Now the definition of a convolution of with a filter becomes:

(4)

where . represents a rescaling of the graph Laplacian that maps the eigenvalues from to since Chebyshev polynomial forms an orthogonal basis in .

First order of ChebNet(1stChebNet)

Kipf et al. kipf2016semi propose a first-order approximation of ChebNet which assumes and to get a linear function. Equation 4 simplifies to:

(5)

And further assuming , the definition of graph convolution becomes

(6)

Because has eigenvalues in the range , it may lead to exploding or vanishing gradients when used in a deep neural network model. To alleviate this problem, Kipf et al. [12] use a renormalization trick , with and . It’s a further simplification and it means adding a self-loop to each node in practice. Finally, we can generalize this definition of the graph convolution layer:

(7)

where is with C-dimensional feature vector for every node, is a matrix of filter parameters and is the convolved result. The graph convolution defined by this format is localized in space and connects the spectral-based methods with spatial-based ones.

However, the above derivation is based on a premise that the Laplacian matrix is the representation for undirected graphs. It results that these spectral-based models are limited to work only on undirected graphs kipf2016semi . The only way to handle directed edges is to relax directed graphs to undirected ones, which would be unable to represent the actual structure of directed graphs. To address this problem, we propose our spectral-based GCN method for directed graphs in the following section.

3 Method

Existing spectral-based GCNs methods cannot directly work on the directed graphs, but their powerful ability to extract features from graphs are impressive. It is expected that utilizing this ability of spectral-based GCNs in our work can improve the performance of our method. Besides, designing a spectral-based GCN is important for filling the gaps in the field of processing the directed graphs. Motivated by these, we design a spectral-based GCN method for the directed graph in our work.

In this section, we first give the definition of the Laplacians for directed graphs chung2005laplacians , which is fundamental in spectral-based GCN. We then give the approximation of localized spectral filters on directed graphs using Chebyshev polynomials of the diagonal matrix of Laplacian’s eigenvalues. Finally, we describe the models we use in our experiments.

3.1 Laplacians for directed graphs

Eigenvalues and eigenvectors are closely related to almost all major invariants of a graph, linking one extremal property to another. They play a central role in the fundamental understanding of graphs in spectral graph theory chung1997spectral . The eigenvalues and eigenvectors of Laplacian matrix provide very useful information of graph. In a graph Laplacian, if two vertices are connected by an edge with a large weight, the values of the eigenvector at those locations are likely to be similar. The eigenvectors associated with larger eigenvalues oscillate more rapidly and are more likely to have dissimilar values on vertices connected by an edge with high weight. In addition, Laplacian matrix is a semi-positive symmetric matrix and the eigenvectors of the Laplacian matrix are a set of orthogonal basis in n-dimensional space, it’s convenient to perform graph Fourier transform and inverse graph Fourier transform in practice as described in Section 2. According to what we discussed above, the Laplacian matrix can represent the properties of graphs well and graph Laplacian eigenvectors can be used as filtering bases of GCN. In order to deduce the principal properties and structure of a graph from its graph spectrum, we choose to use Laplacian matrix for directed graphs to be the fundamental of our method.

Suppose is a directed graph with vertex set and edge set . For a directed edge in , we say that there is an edge from to , or, has an out-neighbor . The number of out-neighbors of is the out-degree of , denoted by . Using the same representation in Section 2, we can define as the out-degree matrix of a directed graph, where is the adjacency matrix(or weight matrix for weighted directed graph) of the directed graph. If there is a path in each direction between each pair of vertices of the graph , then this directed graph is called strongly connected.

Transition Probability Matrix

Assuming

is a transition probability matrix, where

denotes the probability of moving from vertex to vertex . For a given directed graph , a transition probability matrix is defined as

(8)

For a weighted directed graph with edge weights , a transition probability matrix can be defined as being proportional to the corresponding weights and formally we have

(9)

An unweighted directed graph is just a special case with weight having value 1 or 0. In practice, the transition probability matrix can be presented by

(10)

Perron Vector

The Perron-Frobenius Theorem horn2012matrix states that an irreducible matrix with non-negative entries has a unique left eigenvector with all entries positive. This can be translated to language for directed graphs. Let denote the eigenvalue of the all positive eigenvector of the transition probability matrix , of a strongly connected directed graph has a unique left eigenvector with for all and

(11)

where is a row vector. According to the Perron-Frobenius Theorem, we have and all other eigenvalues of have absolute value at most 1. Then we normalize and choose so that

(12)

We call the Perron vector of . For a strongly connected graph, is a stationary distribution. Define . Using , we establish the Laplacians for directed graphs in the following paragraph.

Definition of Directed Laplacian

As described in Section 2, in undirected graphs, we have the definition of and we can further derive this definition

(13)

Now we generalize this definition of undirected graphs to directed graph. We find the most important problem is that is not symmetric in directed graph. So we use this following definition to guarantee that the normalized Laplacian is symmetric.

(14)

3.2 Spectral GCN for Directed Graph

As the Laplacian defined in Equation 14 is symmetric, we can calculate it’s eigendecomposition as the filter. Then we approximate this filter using the Chebyshev polynomials and set it to first-order as we demonstrated in Section 2. Finally, we can derive the definition of the directed graph convolution layer:

(15)

where adjacent matrix(weight matrix) used in this definition to derive and are added self-loop for each node. That is, , , and is calculated based on . is feature vector for every node, is a matrix of filter parameters, is the convolved result.

Now we get the propagation model for directed graph convolution of our method DGCN(Directed Graph Convolutional Network). The details of DGCN propagation model are shown in Figure 1. The symbols in this figure represent the same meaning as defined in Equation 15. Edge information and node information is obtained from the input. The edge index and edge weight represent the edge and its weight in the graph after processing in DGCN propagation model.

Figure 1: Details of DGCN propagation model.

3.3 Models

After introducing the propagation model, we design training models to solve the semi-supervised node classification for directed graph. In pre-processing step, we calculate . Based on the conclusions in Section 3.2, we can naturally design models of multiple layers. Here we give a two-layer DGCN for example.

(16)

where is the vectors of nodes’ features. Note that doesn’t contain information presented in , such as links between pages in a Wikipedia network. The neural network weights and are trained using gradient descent. In Equation 16, is an input-to-hidden weight matrix and

is a hidden-to-output weight matrix. The softmax activation function is

. We evaluate the cross-entropy loss over all labeled examples:

(17)

where denotes labels and is the set of node indices that have labels. We also use dropout to reduce overfitting in our graph convolutional network.

Considering the semi-supervised classification tasks of different difficulty level, we design two models in our experiments. One is a two-layer model and the other is a three-layer model. The reason we use two-layer and three-layer model is to avoid overfitting along with the increasing number of parameters with deeper model depth as described in kipf2016semi . Figure 2 shows the architectures of our models. Each hidden layer in the graph convolutional network is a DGCN propagation model.

Figure 2: Architectures of our models.

4 Experiments

We test our models in the semi-supervised nodes classification tasks on four different datasets. All the datasets in our experiments can be obtained from open sources. These datasets have different graph structures and belong to different kinds of networks(citation networks, hyperlink networks and email networks). It guarantees that the assessments based on these datasets are comprehensive and objective.

4.1 Datasets

Dataset statistics are summarized in Table  1. We introduce the number of total nodes and edges of each dataset. The nodes belong to different classes and we give the number of these classes. Nodes and edges of the largest strongly connected component(LSCC) are also showed in this table. For all the datasets, we calculate the strongly connected component of the graphs and process the graphs into the edgelist format. The details of each dataset are given as follows.

Blogs

A directed network of hyperlinks among a large set of U.S. political weblogs from before the 2004 election Adamic:2005:PBU:1134271.1134277 . It includes blog political affiliation as metadata. Links between blogs were automatically extracted from a crawl of the front page of the blog. In addition, the authors drew on various sources (blog directories, and incoming and outgoing links and posts around the time of the 2004 presidential election) and classified 758 blogs as left-leaning and the remaining 732 as right-leaning.

Wikipedia

The hyperlink network of Wikipedia pages on editorial norms bradi16 , in 2015. Nodes are Wikipedia entries, and two entries are linked by a directed edge if one hyperlinks to the other. Editorial norms cover content creation, interactions between users, and formal administrative structure among users and admins. Metadata includes page information such as creation date, number of edits, page views and so on. The number of norm categories is also given.

Email

The network was generated using email data from a large European research institution snapnets . We have anonymized information about all incoming and outgoing email between members of the research institution. There is an edge in the network if person sent person at least one email. The emails only represent communication between institution members. The dataset also contains ground-truth community memberships of the nodes. Each individual belongs to exactly one of 42 departments at the research institute.

Cora-cite

Citations among papers indexed by CORA, from 1998, an early computer science research paper search engine konect:2017:subelj_cora . Nodes in CORA citation network represent scientific papers. If a paper cites a paper also in this dataset, then a directed edge connects to . Papers not in the dataset are excluded. The papers are divided into 10 different computer science areas manually according to each paper’s description.

Dataset Nodes Edges Nodes of LSCC Edges of LSCC Classes
Blogs 1490 19090 793 15783 2
Wikipedia 1976 17235 1345 14601 10
Email 1005 25571 803 27429 42
Cora-cite 23166 91500 3991 18007 10
Table 1: Datasets

4.2 Set-up

We follow the experimental setup in kipf2016semi . In pre-processing, we calculate the largest strongly connected component of each dataset. For simple tasks(e.g. datasets with less than or equal to 10 classes), we design a two-layer model. For complicated tasks(e.g. Email dataset has more than 40 classes of nodes and less than 1000 nodes), we design a three-layer model to better extract graph data features. We use these two models for our four datasets. We train the models using about 10% of the nodes of the graph in each dataset following the settings of existing works kipf2016semi

. Then we use the rest of 90% nodes as test datasets to evaluate prediction accuracy. For the node features, we concatenate a one-hot encoding of each node in the graph and the original features from the datasets. In practice, we implement our method using PyTorch and PyTorch Geometric(A geometric deep learning extension library for PyTorch)

fey2019fast . The codes to reproduce our experiments will be published if our paper is accepted.

4.3 Baselines

We compare with several state-of-the-art baselines methods, including spatial-based methodmorris2018weisfeiler , spectral-based methodskipf2016semi defferrard2016convolutional and method combining with the attention mechanismvelivckovic2017graph . The first is the classic spectral-based 1stChebNet(GCN) kipf2016semi . This is one of the best spectral-based GCN according to kipf2016semi . The second is the Chebyshev spectral convolutional graph(ChebConv) defferrard2016convolutional . The third method is the graph attention network(GAT) velivckovic2017graph , which leverages masked self-attentional layers to address the shortcomings of classic GCN methods. The fourth is the graph neural network(GraphConv) morris2018weisfeiler , which can take higher-order graph structures at multiple scales into account. In this method, we choose mean function to aggregate node features as described in their paper.

4.4 Results

Results of classification accuracy on test sets of our experiments are summarized in Table  2

. We trained and tested our models on the datasets with different splitting of train sets and test sets. We report the mean accuracy and confidence interval of 20 runs with random weight initializations. For Blogs, Wikipedia and Cora-cite datasets, we use the two-layer model. For Email dataset, we use the three-layer model. For the same dataset, we use training model with the same architecture and parameters. The only difference is the propagation model of the convolution layer.

As we can see in Table  2, our method outperforms the four baselines on four different datasets. The reason that our method achieves better performances may be described as followed. Our method makes use of the Laplacian designed for directed graphs, which has stronger ability to capture the connections between nodes of the network and to extract features from directed graphs.

The performances of all the methods are not so well on Cora-cite dataset and we believe there are three reasons. First, the Cora-cite dataset has 3991 nodes and only 18007 edges, it’s a complex classification task. Second, the dataset has no node features, we have to construct a one-hot encoding of each node in the graph as the node features. Third, the classes of this dataset are manually divided into 10 areas according to each paper’s description, it may cause some deviation from the ground truth.

Method Blogs Wikipedia Email Cora-cite
GCN
GraphConv
GAT
ChebConv
DGCN(Ours)
Table 2: Results of classification accuracy on test sets with 95% confidence level(in percent)

5 Discussion

As demonstrated in the previous sections, our method for semi-supervised nodes classification of directed graphs outperforms several state-of-the-art methods. However, our method does have some limitations. First, the computational cost of our model increases with the graph size because our method needs to compute eigenvector of the transition probability matrix. It’s a practical way to reduce the computational cost by implementing the matrix product using Coordinate Format(COO Format), but when paralleling or scaling to large graphs, the computational cost of our spectral-based method is still a problem. Second, our method also has to handle the whole graph at the same time, so the memory requirement is very high for spectral-based GCN method. The approximations of the large and densely connected graph can be very helpful as described in kipf2016semi . Third, our method is based on a premise that the input directed graph of our DGCN model should be strongly connected. According to this, we should calculate the largest strongly connected components of each dataset, which can cause some nodes to be removed from the original graph.

6 Conclusion and Future Work

In this paper, we propose a novel method to design the propagation model of spectral-based GCN layer to adapt to directed graphs. Experiments on a number of directed network datasets suggest that our method can work directly on the directed graph in the semi-supervised nodes classification tasks. Our method outperforms several state-of-the-art baseline methods, including spatial-based methods, spectral-based methods and methods combining with the attention mechanism.

In the future, there are several potential improvements and extensions to our work. For example, overcoming the practical problems described in Section 5

to reduce the computing cost and to handle graph in batch sizes can be a challenge in future work. We also believe it’s feasible to combine other techniques like attention mechanism with our method to improve the performances on more datasets. In addition, combining GCN for directed graphs and reinforcement learning in multi-agent systems may be an attractive idea.

References