I Introduction
Graph convolutional neural networks (GCNNs) are gaining momentum as a promising tool for addressing a variety of classification and regression tasks for data that live in irregular spaces, such as network data [1]
. With the aim to replicate the success of traditional CNNs, GCNNs play a fundamental role in semisupervised learning on graphs and classifying different graph signals (i.e., values indexed by the nodes of the graph).
A central aspect of GCNNs is the extension of the convolution operation to graph signals. The seminal work [2] defined convolution as the pointwise multiplication in the graph spectral domain between the projected graph signal in this domain and the transfer function of a learnable filter. Such an approach finds solid grounds on graph signal processing (GSP) theory [3], where the graph spectrum plays the role of the Fourier basis for graph signals. Subsequently, several works exploited this connection and proposed computationally lighter GCNN models. In particular, [4, 5, 6, 7] used the socalled polynomial graph filters [3], while [8, 9] relied on graph filters having a rational transfer function [10]. Differently, [11] designed architectures by using the nodevariant (NV) graph filters [12], while [13] introduced MIMO approaches to learn from multiple features.
While the above works introduce techniques that extend CNNs to graphs, they are mostly based on (wellmotivated) analogies with classical neural networks. Such a strategy presents, however, its own limitations towards extending these methods to more involved ones that better exploit the graph.
The main aim of this paper is to formulate a general framework that unifies stateoftheart GCNN architectures facilitating comparison, showing their limitations, and highlighting their potential. For such a goal, it explores the socalled edgevariant (EV) graph filters [14, 15]. The EV graph filter is a local, linear, and finiteorder recursion in the node domain where each node weighs differently, in each iteration, the information in its neighborhood. Therefore, it presents the most general linear and local operation that a node can do –gather information from all neighbors and weight each of them differently. This local property puts the EV as a computationally efficient candidate (only local information is exchanged) for capturing detail at the node connection level.
Nevertheless, in the general form, the learnable parameters of the EV graph filter depend on the number of graph edges. To tackle the latter, we first show how stateoftheart solutions fall under the EV recursion and how they impose parsimonious models on the learnable parameters, rendering their number independent of the graph dimensions. Then, we explore such insights to provide guidelines for designing a variety of novel architectures in the spirit of the EV recursion whose number of parameters is independent of the graph dimensions.
In a nutshell, the contributions of this paper are: To formulate a general framework for GCNNs through EV graph filters. To show how the stateoftheart approaches are specific parameterizations of this EV recursion. To present rigorous design guidelines for GCNNs based on the EV recursion that preserve locality and whose number of parameters is independent of the graph dimension. To introduce one new such an architecture and show its superior performance for graph signal classification tasks.
Ii Background
Iia Graphs and graph filters
Let be a weighted graph with vertex set of cardinality and edge set composed of ordered pairs iff there is an edge between nodes and . For each node , we define the neighborhood set as the set of nodes connected to . The sparsity of the edge set of is represented by an matrix , named the graph shift operator matrix, where if or . Candidates for are the graph adjacency , the graph Laplacian matrix (undirected graphs) or any of their normalizations. For generality, in the sequel, we will focus on directed graphs.
Along with , consider a set of signal values (features) in which component resides on node . By exploiting the coupling between the graph and the graph signal , it is possible to compute a graph harmonic analysis for similarly to the one performed for temporal and image signals. Specifically, given the eigendecomposition
, the graph Fourier transform (GFT) of
is . Likewise, the inverse transform is . Here, contains along the columns the oscillating modes of the graph andare the respective Fourier coefficients. The eigenvalues
represent the spectral support for and are commonly referred to as the graph frequencies [3].Given the Fourier expansion, we can now filter directly in the spectral domain. That is, for being the filter spectral response (transfer function), the filter output is computed as the convolution (pointwise multiplication in the spectral domain) between the filter transfer function and the GFT of . By means of the inverse GFT, the vertex domain output becomes
(1) 
However, (1) is not local, since in computing the output , node needs access to the graph signal of nonneighboring nodes. To account for the locality, we can define the filtering operation directly in the vertex domain as the aggregation of neighboring information. The node output for an order one local filter is
(2) 
where the scalar parameter weighs the information of the neighboring node . Nevertheless, this direct vertex domain definition does not enjoy a spectral behavior analysis limiting the connection with the convolution operation.
One way to link the spectral and the vertex domain filtering is to consider the polynomial graph filters [3] with output
(3) 
Due to the locality of , can be obtained in the vertex domain through local information exchange. The state is simply the graph signal, while consists of onehop information exchange between adjacent nodes. The higher order states are computed recursively as , i.e., by exchanging with the neighbors the previous intermediate state . Such an implementation amounts for a complexity of order . By means of the GFT, the filtering operation in (3) has the transfer function
(4) 
Therefore, we conclude that the output in (3) consists of the convolution between a graph filter with a polynomial transfer function and . This filter enjoys a local implementation and captures detail in a neighborhood of radius from the node.
IiB Edgevariant graph filters
The edgevariant graph filter is a finite order recursion implemented in the vertex domain in the form similar to (2) [15]. Let be a collection of matrices that share the sparsity pattern of . The intermediate states of the EV filter are computed recursively as ; ; and
(5) 
Since shares the support with , the state accounts also for the scaling of through the diagonal elements (i.e., each node scales its own signal with a different parameter ). The higher order states for are again obtained recursively since the parameter matrices respect the graph connectivity. Put differently, each considers a different parameter for each edge and adds potential selfloops through . Node computes then the th order state as
(6) 
By defining and putting back together all terms in (5), the output of an order EV graph filter is
(7) 
The total number of parameters of the edgevariant graph filter is , which is in general smaller than the
parameters of an arbitrary linear transform
. In computing the output in (7) the EV graph filter incurs in an overall computational complexity of order which is similar to that of (3) since in general or we can consider an EV recursion without selfloops.The ability to capture local detail at the edge level and the reduced implementation complexity is leveraged next to define graph neural networks (GNN) with a controlled number of parameters and computational complexity matched to the sparsity pattern of the graph.
Iii Edgevariant graph neural networks
Consider a training set composed of examples of inputs and output representations . A GNN leverages the underlying graph representation of the data and learns a model such that
minimizes some loss function
for and generalizes well for .To capture several levels of detail, model is layered into the cascade of functions each consisting of a succession of linear transforms and nonlinearities. Layer produces as output a collection of higher level signal features obtained through processing the features computed at the previous layer. The th higher level feature is computed as
(8) 
where
represents the nonlinearity that might be pointwise (e.g., ReLU) or graph dependent
[16] and leverages the graph structure to relate the th input feature to the th output feature .The graph basically serves as a parameterization to reduce both the computational complexity and the number of parameters. GCNNs, in particular, consider to be a graph filter that has a spectral interpretation such as (1) or (4). In the sequel, we consider to be an EV graph filter and show that current approaches represent different parameterizations to induce the spectral convolution into the EV GNN.
Iiia Properties of the edgevariant neural network layer
First, it does not require the knowledge of , but only of its support. This is because, differently from current solutions, it will learn from the training data a collection of parameter matrices , where each of them acts as a different graph shift operator. Therefore, it represents a robust learning strategy for data residing on graphs whose edge weights are known up to some uncertainty, known only partially, or not known at all, such as biological networks [17].
Second, the computational complexity of each layer is linear in the graph parameters. By setting, , the overall complexity of the EV layer is of order matching that of current stateoftheart GCNN approaches.
Third, the number of parameters per layer is, at most
. The latter, although allowing the EV to have the maximum degrees of freedom given a topology, may often be a limitation for large graphs or when
is small. Our goal in the next section is, therefore, to show that GCNN layers proposed in the literature are particular cases of (9). Establishing these relationships allows the proposal of novel solutions that increase the descriptive power while preserving an efficient implementation complexity.IiiB Parametrizations
Polynomial GCNN. Several variants of GCNNs introduced in the literature use at each layer graph filters of the form
(10) 
where in the filter definition we omitted the layer and feature indices to simplify notation. This is the case of [6, 7] which use general polynomials and [4, 5] that consider Chebyshev polynomials.
These filters can be expressed in the form (9) by restricting the parameter matrices to for and . In other words, the EV and the polynomial recursions represent two extremes to implement graph filters locally. The EV recursion allows each node to learn for each iteration a different parameter that reflects the importance of ’s node features to node . The polynomial implementation instead forces all nodes to weigh the information of all neighbors with the same parameter within the th iteration. However, this restriction makes the number of parameters independent from and .
This way of parameterizing the EV recursion creates now opportunities for proposing a myriad of intermediate solutions that extend (10) towards the edgevariant implementation (9). One such an approach may be a recursion that in addition to (10) considers parameters also for the most critical edges (e.g., edges without which the graph becomes disconnected).
Remark 1: Along with the above works, also [18, 19, 20] and [8, 9] fall under the lines of the polynomial filtering (10). In specific, [18] considers single shifts on graphs using as graph shift operator a learnable weight matrix, [19] considers a Gaussian kernel to mix the neighboring node information, while [20] uses random walks. The works in [8, 9], although aiming to build a GCNN layer by using graph filters with a rational transfer function, approximate the inherited matrix inverse in the vertex domain by a finite iterative algorithm. This finite approximation implicitly transforms these techniques into polynomial recursions whose order depend on the number of iterations (see also [10] for more detail).
Spectral GCNN. We here establish a link between the edgevariant recursion (9) and the spectral GCNN [2] to provide more insights on its convolutional behavior. The spectral GCNN exploits (1) and learns directly the filter a transfer function . To keep the number of parameters independent from , is parameterized as
(11) 
where is a prefixed kernel matrix and are the learnable parameters. Therefore, the number of parameters for each layer is at most , while the computational complexity is of order required to compute the GFT of the features. Additionally, such an approach requires the eigendecomposition of (order to be computed once) and the learned does not capture the local detail around each vertex.
Nevertheless, this spectral interpretation is useful to understand the EV behavior. We can force the EV recursion (9
) to have a spectral response by restricting all coefficient matrices to share the eigenvectors with
, i.e., [15]. Then, EV transfer function becomes(12) 
Subsequently, let be the index set defining the zero entries of . The fixed support condition for each is
(13) 
where is a selection matrix whose rows are those of indexed by ,
denotes the vectorization operation, and
is the vector of all zeros. From the properties of the operator, (13) becomes(14) 
where “*” denotes the KhatriRao product and is the vector composed by the diagonal elements of . Put differently, (14) implies
(15) 
Finally, by considering as a basis that spans the nullspace of , we can expand as and write (12) as
(16) 
for some basis expansion coefficients .
The eigendecomposition of implicitly reduces the total number of layer parameters from to rank with rank. That is, there is a subclass of the EV recursion that respects the operation of convolution, but, differently from (11), it captures local detail in the vertex domain and enjoys a linear implementation complexity. This subclass has also analogies with (11), which is obtained by setting , , and .
In general, we may conclude that the EV recursion implements a GNN layer that goes beyond convolution. Drawing analogies with linear system theory, the GCNN approaches behave as a linear timeinvariant (now shiftinvariant [3]) filter, while the EV graph filter behaves as linear timevarying (now shiftvarying; a different shift per ) filter that trades the convolutional interpretation with the ability to capture timevarying (now shiftvarying) detail.
Nodevariant GCNN. The idea to propose GNNs that extend convolution is also considered in [11], which proposed an architecture having as graph filter the recursion
(17) 
where is a set of privileged nodes (e.g., the nodes with the highest degree), is a tall binary matrix, and is a vector of parameters for the nodes in . In short, (17) learns for each shift, different coefficients for the nodes in and then maps them through to the remaining nodes .
This filter is another way to restrict the EV degrees of freedom, which parameterizes the coefficient matrices to and for . That is, (17) is an intermediate approach between the polynomial (10) and the EV (9) recursions and allows each node to learn, for each , a different parameter that reflects the importance of all its neighborhood to node . The total number of parameters per layer is at most while the computational complexity is similar to that of (10).
This different way of parameterizing the EV recursion provides alternative choices to build new intermediate architectures that lever the idea of privileged nodes while giving importance to the edgebased detail. In the sequel, we propose one such an extension that merges insights from the EV, the polynomial, and the NV architecture.
IiiC Hybrid edgevariant neural network layer
The hybrid edgevariant (HEV) layer considers the linear operation in (8) to be a graph filter of the form
(18) 
where is a diagonal matrix whose th diagonal element iff node belongs to the privileged set ; are a collection of matrices whose th element iff and ; and are a collection of scalars. Put simply, recursion (18) allows nodes in to learn nodevarying parameters for and edgevarying parameters for , while the nodes in learn global parameters similar to (10).
This approach represents yet another intermediate architecture between the full convolutional ones and the full EV. By setting as the maximum number of neighbors for the nodes in , the overall number of parameters per layer is at most . Finally, the HEV implementation cost is of order .
Iv Numerical results
We compare the proposed edgevariant and hybrid edgevariant architectures with the spectral, polynomial, and the nodevariant alternatives on a source localization and an author attribution problem. For both experiments, we designed all architectures (except for the spectral GCNN) to have the same computational cost.
Iva Source Localization
Setup. The goal of this experiment is to find out which community in a stochastic block model (SBM) graph is the source of a diffusion process by observing different diffused signals originated at different (unknown) communities at different (unknown) time instants. is an undirected SBM graph of nodes divided equally into
communities with respective intra and intercommunity edge probabilities of
and . The initial graph signal is a Kronecker delta centered at node and its realization at time is with . We generated the training set comprising samples by selecting uniformly at random both and . We then tested the different approaches on new samples and averaged the performance over different data and different graph realizations for an overall of MonteCarlo runs.Models and results. We considered seven architectures each of them composed of the cascade of a graph filtering layer with ReLU nonlinearity and a fully connected layer with softmax nonlinearity. The architectures are: a spectral GCNN (11) with being a cubic spline kernel and ; a polynomial GCNN (10) of order ; two NV GNNs (17) of order and privileged nodes selected by maximum degree and spectral proxies [21]; an EV GNN (9) of order ; and two HEV GNNs (18) of order and privileged nodes selected similarly to the NV case. We used the ADAM optimizer with a learning rate and decaying factors and run over epochs with batches of size .
Table I shows the obtained results where we see that because of their increased capacity the EV and the HEV outperform the other alternatives. We observe that the hybrid approaches exploit better the edgevarying part when is composed of the nodes with the highest degree.
IvB Authorship Attribution
Setup. In this experiment, we aim to classify if a text excerpt belongs to Edgar Allan Poe or to any other contemporary author. For each text excerpt, we built the graph from the word adjacency network (WAN) between function words that act as nodes. These WANs serve as stylistic signatures for the author (see [22] for full details). Fixed then WAN for Poe, we treat the frequency count of the function words as a graph signal.
In particular, we considered text excerpts by Poe and randomly split the dataset into training, validation, and testing texts. We sumed the adjacency matrices of the WANs obtained from the training texts to get the Poe’s signature graph. We completed the training, validation, and test sets by adding respectively other , , and randomly selected texts by contemporary authors. We set to be the adjacency matrix of the Poe’s signature graph and averaged the performance over different data splits.
Models and results. We analyzed the same architectures as in the previous section but set the number of output features to , the recursion orders to , for the spectral GCNN, and . We used the same ADAM optimizer for training now over epochs with batch sizes of samples.
The results in Table II show that the hybrid approaches offer the best performance highlighting the potential of solutions that consider both edgedependent and global coefficients. In fact, the polynomial model with global coefficients suffers the most in this experiment.
Model  Accuracy 

Spectral  26.89( 0.87)% 
Polynomial  74.55( 7.32)% 
Node Variant (NV) Degree  74.77( 7.77)% 
Node Variant (NV) S. Proxies  75.62( 8.19)% 
Edge Variant (EV)  85.47(10.77)% 
Hybrid EV (HEV) Degree  80.53(10.21)% 
Hybrid EV (HEV) S. Proxies  75.37( 8.20)% 
V Conclusion
We proposed a general framework that unifies stateoftheart GCNN architectures into one recursion, named the edgevariant recursion. This unification highlighted the different tradeoff between the number of parameters and the amount of local detail that each approach adopts. Moreover, it shows rigorous ways to choose different tradeoffs and come up with a novel and ad hoc architecture for a problem at hand that is implemented locally in the vertex domain. We here proposed one, among many, extension and showed that it outperforms current solutions for graph signal classification tasks.
Model  Accuracy 

Spectral  88.88( 1.50)% 
Polynomial  79.88(15.31)% 
Node Variant (NV) Degree  88.88( 2.62)% 
Node Variant (NV) S. Proxies  86.12( 5.94)% 
Edge Variant (EV)  89.00( 2.11)% 
Hybrid EV (HEV) Degree  89.18( 1.99)% 
Hybrid EV (HEV) S. Proxies  90.00( 1.21)% 
References
 [1] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 61–80, Jan. 2009.
 [2] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and deep locally connected networks on graphs,” arXiv:1312.6203v3 [cs.LG], 21 May 2014. [Online]. Available: http://arxiv.org/abs/1213.6203
 [3] A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proceedings of the IEEE, vol. 106(5), pp. 808–828, 2018.
 [4] T. N. Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” in 5th Int. Conf. Learning Representations. Toulon, France: Assoc. Comput. Linguistics, 2426 Apr. 2017.
 [5] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Annu. Conf. Neural Inform. Process. Syst. 2016. Barcelona, Spain: NIPS Foundation, 510 Dec. 2016.
 [6] J. Du, S. Zhang, G. Wu, J. M. F. Moura, and S. Kar, “Topology adaptive graph convolutional networks,” arXiv:1710.10370v2 [cs.LG], 2 Nov. 2017. [Online]. Available: http://arxiv.org/abs/1710.10370v2
 [7] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural network architectures for signals supported on graphs,” IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
 [8] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “CayleyNets: Graph convolutional neural networks with complex rational spectral filters,” IEEE Trans. Signal Process., vol. 67(1), pp. 97–107, Jan. 2019.
 [9] F. M. Bianchi, D. Grattarola, C. Alippi, and L. Livi, “Graph neural networks with convolutional ARMA filters,” Feb. 2019. [Online]. Available: http://arxiv.org/abs/1901.01343
 [10] E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving average graph filtering,” IEEE Trans. Signal Process., vol. 65, no. 2, pp. 274–288, Jan. 2017.
 [11] F. Gama, G. Leus, A. G. Marques, and A. Ribeiro, “Convolutional neural networks via nodevarying graph filters,” in 2018 IEEE Data Sci. Workshop. Lausanne, Switzerland: IEEE, 46 June 2018, pp. 1–5.
 [12] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graphfilter design and applications to distributed linear network operators,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4117–4131, Aug. 2017.
 [13] F. Gama, A. G. Marques, A. Ribeiro, and G. Leus, “MIMO graph filters for convolutional networks,” in 19th IEEE Int. Workshop Signal Process. Advances in Wireless Commun. Kalamata, Greece: IEEE, June 2018.
 [14] M. Coutino, E. Isufi, and G. Leus, “Distributed edgevariant graph filters,” in 2017 IEEE Int. Workshop Comput. Advances MultiSensor Adaptive Process. Curacao, Dutch Antilles: IEEE, 1013 Dec. 2017.
 [15] ——, “Advances in distributed graph filtering,” arXiv:1808.03004v1 [eess.SP], 9 Aug. 2018. [Online]. Available: http://arxiv.org/abs/1808.03004

[16]
L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, “Median activation functions for graph neural networks,” in
44th IEEE Int. Conf. Acoust., Speech and Signal Process. Brighton, UK: IEEE, 1217 May 2019.  [17] B. Wang, A. Pourshafeie, M. Zitnik, J. Zhu, C. Bustamante, S. Batzoglou, and J. Leskovec, “Network enhancement as a general method to denoise weighted biological networks,” Nature Communications, vol. 9, no. 3108, pp. 1–8, Aug. 2018.

[18]
M. Simonovsky and N. Komodakis, “Dynamic edgeconditioned filters in
convolutional neural networks on graphs,” in
2017 IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recognition
, Honolulu, July 2017. 
[19]
F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model CNNs,” in
2017 IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recognition. Honolulu, HI: IEEE, July 2017.  [20] J. Atwood and D. Towsley, “Diffusionconvolutional neural networks,” in 30th Annu. Conf. Neural Inform. Process. Syst. Barcelona, Spain: NIPS Foundation, 510 Dec. 2016.
 [21] A. Anis, A. Gadde, and A. Ortega, “Efficient sampling set selection for bandlimited graph signals using graph spectral proxies,” IEEE Trans. Signal Process., vol. 64, no. 14, pp. 3775–3789, July 2016.
 [22] S. Segarra, M. Eisen, and A. Ribeiro, “Authorship attribution through function word adjacency networks,” IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5464–5478, Oct. 2015.
Comments
There are no comments yet.