Graph convolutional neural networks (GCNNs) are gaining momentum as a promising tool for addressing a variety of classification and regression tasks for data that live in irregular spaces, such as network data 
. With the aim to replicate the success of traditional CNNs, GCNNs play a fundamental role in semi-supervised learning on graphs and classifying different graph signals (i.e., values indexed by the nodes of the graph).
A central aspect of GCNNs is the extension of the convolution operation to graph signals. The seminal work  defined convolution as the point-wise multiplication in the graph spectral domain between the projected graph signal in this domain and the transfer function of a learnable filter. Such an approach finds solid grounds on graph signal processing (GSP) theory , where the graph spectrum plays the role of the Fourier basis for graph signals. Subsequently, several works exploited this connection and proposed computationally lighter GCNN models. In particular, [4, 5, 6, 7] used the so-called polynomial graph filters , while [8, 9] relied on graph filters having a rational transfer function . Differently,  designed architectures by using the node-variant (NV) graph filters , while  introduced MIMO approaches to learn from multiple features.
While the above works introduce techniques that extend CNNs to graphs, they are mostly based on (well-motivated) analogies with classical neural networks. Such a strategy presents, however, its own limitations towards extending these methods to more involved ones that better exploit the graph.
The main aim of this paper is to formulate a general framework that unifies state-of-the-art GCNN architectures facilitating comparison, showing their limitations, and highlighting their potential. For such a goal, it explores the so-called edge-variant (EV) graph filters [14, 15]. The EV graph filter is a local, linear, and finite-order recursion in the node domain where each node weighs differently, in each iteration, the information in its neighborhood. Therefore, it presents the most general linear and local operation that a node can do –gather information from all neighbors and weight each of them differently. This local property puts the EV as a computationally efficient candidate (only local information is exchanged) for capturing detail at the node connection level.
Nevertheless, in the general form, the learnable parameters of the EV graph filter depend on the number of graph edges. To tackle the latter, we first show how state-of-the-art solutions fall under the EV recursion and how they impose parsimonious models on the learnable parameters, rendering their number independent of the graph dimensions. Then, we explore such insights to provide guidelines for designing a variety of novel architectures in the spirit of the EV recursion whose number of parameters is independent of the graph dimensions.
In a nutshell, the contributions of this paper are: To formulate a general framework for GCNNs through EV graph filters. To show how the state-of-the-art approaches are specific parameterizations of this EV recursion. To present rigorous design guidelines for GCNNs based on the EV recursion that preserve locality and whose number of parameters is independent of the graph dimension. To introduce one new such an architecture and show its superior performance for graph signal classification tasks.
Ii-a Graphs and graph filters
Let be a weighted graph with vertex set of cardinality and edge set composed of ordered pairs iff there is an edge between nodes and . For each node , we define the neighborhood set as the set of nodes connected to . The sparsity of the edge set of is represented by an matrix , named the graph shift operator matrix, where if or . Candidates for are the graph adjacency , the graph Laplacian matrix (undirected graphs) or any of their normalizations. For generality, in the sequel, we will focus on directed graphs.
Along with , consider a set of signal values (features) in which component resides on node . By exploiting the coupling between the graph and the graph signal , it is possible to compute a graph harmonic analysis for similarly to the one performed for temporal and image signals. Specifically, given the eigendecomposition
, the graph Fourier transform (GFT) ofis . Likewise, the inverse transform is . Here, contains along the columns the oscillating modes of the graph and
are the respective Fourier coefficients. The eigenvaluesrepresent the spectral support for and are commonly referred to as the graph frequencies .
Given the Fourier expansion, we can now filter directly in the spectral domain. That is, for being the filter spectral response (transfer function), the filter output is computed as the convolution (pointwise multiplication in the spectral domain) between the filter transfer function and the GFT of . By means of the inverse GFT, the vertex domain output becomes
However, (1) is not local, since in computing the output , node needs access to the graph signal of non-neighboring nodes. To account for the locality, we can define the filtering operation directly in the vertex domain as the aggregation of neighboring information. The node output for an order one local filter is
where the scalar parameter weighs the information of the neighboring node . Nevertheless, this direct vertex domain definition does not enjoy a spectral behavior analysis limiting the connection with the convolution operation.
One way to link the spectral and the vertex domain filtering is to consider the polynomial graph filters  with output
Due to the locality of , can be obtained in the vertex domain through local information exchange. The state is simply the graph signal, while consists of one-hop information exchange between adjacent nodes. The higher order states are computed recursively as , i.e., by exchanging with the neighbors the previous intermediate state . Such an implementation amounts for a complexity of order . By means of the GFT, the filtering operation in (3) has the transfer function
Therefore, we conclude that the output in (3) consists of the convolution between a graph filter with a polynomial transfer function and . This filter enjoys a local implementation and captures detail in a neighborhood of radius from the node.
Ii-B Edge-variant graph filters
The edge-variant graph filter is a finite order recursion implemented in the vertex domain in the form similar to (2) . Let be a collection of matrices that share the sparsity pattern of . The intermediate states of the EV filter are computed recursively as ; ; and
Since shares the support with , the state accounts also for the scaling of through the diagonal elements (i.e., each node scales its own signal with a different parameter ). The higher order states for are again obtained recursively since the parameter matrices respect the graph connectivity. Put differently, each considers a different parameter for each edge and adds potential self-loops through . Node computes then the th order state as
By defining and putting back together all terms in (5), the output of an order EV graph filter is
The total number of parameters of the edge-variant graph filter is , which is in general smaller than the
parameters of an arbitrary linear transform. In computing the output in (7) the EV graph filter incurs in an overall computational complexity of order which is similar to that of (3) since in general or we can consider an EV recursion without self-loops.
The ability to capture local detail at the edge level and the reduced implementation complexity is leveraged next to define graph neural networks (GNN) with a controlled number of parameters and computational complexity matched to the sparsity pattern of the graph.
Iii Edge-variant graph neural networks
Consider a training set composed of examples of inputs and output representations . A GNN leverages the underlying graph representation of the data and learns a model such that
minimizes some loss functionfor and generalizes well for .
To capture several levels of detail, model is layered into the cascade of functions each consisting of a succession of linear transforms and nonlinearities. Layer produces as output a collection of higher level signal features obtained through processing the features computed at the previous layer. The th higher level feature is computed as
represents the nonlinearity that might be point-wise (e.g., ReLU) or graph dependent and leverages the graph structure to relate the th input feature to the th output feature .
The graph basically serves as a parameterization to reduce both the computational complexity and the number of parameters. GCNNs, in particular, consider to be a graph filter that has a spectral interpretation such as (1) or (4). In the sequel, we consider to be an EV graph filter and show that current approaches represent different parameterizations to induce the spectral convolution into the EV GNN.
Iii-a Properties of the edge-variant neural network layer
First, it does not require the knowledge of , but only of its support. This is because, differently from current solutions, it will learn from the training data a collection of parameter matrices , where each of them acts as a different graph shift operator. Therefore, it represents a robust learning strategy for data residing on graphs whose edge weights are known up to some uncertainty, known only partially, or not known at all, such as biological networks .
Second, the computational complexity of each layer is linear in the graph parameters. By setting, , the overall complexity of the EV layer is of order matching that of current state-of-the-art GCNN approaches.
Third, the number of parameters per layer is, at most
. The latter, although allowing the EV to have the maximum degrees of freedom given a topology, may often be a limitation for large graphs or whenis small. Our goal in the next section is, therefore, to show that GCNN layers proposed in the literature are particular cases of (9). Establishing these relationships allows the proposal of novel solutions that increase the descriptive power while preserving an efficient implementation complexity.
Polynomial GCNN. Several variants of GCNNs introduced in the literature use at each layer graph filters of the form
These filters can be expressed in the form (9) by restricting the parameter matrices to for and . In other words, the EV and the polynomial recursions represent two extremes to implement graph filters locally. The EV recursion allows each node to learn for each iteration a different parameter that reflects the importance of ’s node features to node . The polynomial implementation instead forces all nodes to weigh the information of all neighbors with the same parameter within the th iteration. However, this restriction makes the number of parameters independent from and .
This way of parameterizing the EV recursion creates now opportunities for proposing a myriad of intermediate solutions that extend (10) towards the edge-variant implementation (9). One such an approach may be a recursion that in addition to (10) considers parameters also for the most critical edges (e.g., edges without which the graph becomes disconnected).
Remark 1: Along with the above works, also [18, 19, 20] and [8, 9] fall under the lines of the polynomial filtering (10). In specific,  considers single shifts on graphs using as graph shift operator a learnable weight matrix,  considers a Gaussian kernel to mix the neighboring node information, while  uses random walks. The works in [8, 9], although aiming to build a GCNN layer by using graph filters with a rational transfer function, approximate the inherited matrix inverse in the vertex domain by a finite iterative algorithm. This finite approximation implicitly transforms these techniques into polynomial recursions whose order depend on the number of iterations (see also  for more detail).
Spectral GCNN. We here establish a link between the edge-variant recursion (9) and the spectral GCNN  to provide more insights on its convolutional behavior. The spectral GCNN exploits (1) and learns directly the filter a transfer function . To keep the number of parameters independent from , is parameterized as
where is a prefixed kernel matrix and are the learnable parameters. Therefore, the number of parameters for each layer is at most , while the computational complexity is of order required to compute the GFT of the features. Additionally, such an approach requires the eigendecomposition of (order to be computed once) and the learned does not capture the local detail around each vertex.
Nevertheless, this spectral interpretation is useful to understand the EV behavior. We can force the EV recursion (9
) to have a spectral response by restricting all coefficient matrices to share the eigenvectors with, i.e., . Then, EV transfer function becomes
Subsequently, let be the index set defining the zero entries of . The fixed support condition for each is
where is a selection matrix whose rows are those of indexed by ,
denotes the vectorization operation, andis the vector of all zeros. From the properties of the operator, (13) becomes
where “*” denotes the Khatri-Rao product and is the vector composed by the diagonal elements of . Put differently, (14) implies
Finally, by considering as a basis that spans the nullspace of , we can expand as and write (12) as
for some basis expansion coefficients .
The eigendecomposition of implicitly reduces the total number of layer parameters from to rank with rank. That is, there is a subclass of the EV recursion that respects the operation of convolution, but, differently from (11), it captures local detail in the vertex domain and enjoys a linear implementation complexity. This subclass has also analogies with (11), which is obtained by setting , , and .
In general, we may conclude that the EV recursion implements a GNN layer that goes beyond convolution. Drawing analogies with linear system theory, the GCNN approaches behave as a linear time-invariant (now shift-invariant ) filter, while the EV graph filter behaves as linear time-varying (now shift-varying; a different shift per ) filter that trades the convolutional interpretation with the ability to capture time-varying (now shift-varying) detail.
Node-variant GCNN. The idea to propose GNNs that extend convolution is also considered in , which proposed an architecture having as graph filter the recursion
where is a set of privileged nodes (e.g., the nodes with the highest degree), is a tall binary matrix, and is a vector of parameters for the nodes in . In short, (17) learns for each shift, different coefficients for the nodes in and then maps them through to the remaining nodes .
This filter is another way to restrict the EV degrees of freedom, which parameterizes the coefficient matrices to and for . That is, (17) is an intermediate approach between the polynomial (10) and the EV (9) recursions and allows each node to learn, for each , a different parameter that reflects the importance of all its neighborhood to node . The total number of parameters per layer is at most while the computational complexity is similar to that of (10).
This different way of parameterizing the EV recursion provides alternative choices to build new intermediate architectures that lever the idea of privileged nodes while giving importance to the edge-based detail. In the sequel, we propose one such an extension that merges insights from the EV, the polynomial, and the NV architecture.
Iii-C Hybrid edge-variant neural network layer
The hybrid edge-variant (HEV) layer considers the linear operation in (8) to be a graph filter of the form
where is a diagonal matrix whose th diagonal element iff node belongs to the privileged set ; are a collection of matrices whose th element iff and ; and are a collection of scalars. Put simply, recursion (18) allows nodes in to learn node-varying parameters for and edge-varying parameters for , while the nodes in learn global parameters similar to (10).
This approach represents yet another intermediate architecture between the full convolutional ones and the full EV. By setting as the maximum number of neighbors for the nodes in , the overall number of parameters per layer is at most . Finally, the HEV implementation cost is of order .
Iv Numerical results
We compare the proposed edge-variant and hybrid edge-variant architectures with the spectral, polynomial, and the node-variant alternatives on a source localization and an author attribution problem. For both experiments, we designed all architectures (except for the spectral GCNN) to have the same computational cost.
Iv-a Source Localization
Setup. The goal of this experiment is to find out which community in a stochastic block model (SBM) graph is the source of a diffusion process by observing different diffused signals originated at different (unknown) communities at different (unknown) time instants. is an undirected SBM graph of nodes divided equally into
communities with respective intra- and inter-community edge probabilities ofand . The initial graph signal is a Kronecker delta centered at node and its realization at time is with . We generated the training set comprising samples by selecting uniformly at random both and . We then tested the different approaches on new samples and averaged the performance over different data and different graph realizations for an overall of Monte-Carlo runs.
Models and results. We considered seven architectures each of them composed of the cascade of a graph filtering layer with ReLU nonlinearity and a fully connected layer with softmax nonlinearity. The architectures are: a spectral GCNN (11) with being a cubic spline kernel and ; a polynomial GCNN (10) of order ; two NV GNNs (17) of order and privileged nodes selected by maximum degree and spectral proxies ; an EV GNN (9) of order ; and two HEV GNNs (18) of order and privileged nodes selected similarly to the NV case. We used the ADAM optimizer with a learning rate and decaying factors and run over epochs with batches of size .
Table I shows the obtained results where we see that because of their increased capacity the EV and the HEV outperform the other alternatives. We observe that the hybrid approaches exploit better the edge-varying part when is composed of the nodes with the highest degree.
Iv-B Authorship Attribution
Setup. In this experiment, we aim to classify if a text excerpt belongs to Edgar Allan Poe or to any other contemporary author. For each text excerpt, we built the graph from the word adjacency network (WAN) between function words that act as nodes. These WANs serve as stylistic signatures for the author (see  for full details). Fixed then WAN for Poe, we treat the frequency count of the function words as a graph signal.
In particular, we considered text excerpts by Poe and randomly split the dataset into training, validation, and testing texts. We sumed the adjacency matrices of the WANs obtained from the training texts to get the Poe’s signature graph. We completed the training, validation, and test sets by adding respectively other , , and randomly selected texts by contemporary authors. We set to be the adjacency matrix of the Poe’s signature graph and averaged the performance over different data splits.
Models and results. We analyzed the same architectures as in the previous section but set the number of output features to , the recursion orders to , for the spectral GCNN, and . We used the same ADAM optimizer for training now over epochs with batch sizes of samples.
The results in Table II show that the hybrid approaches offer the best performance highlighting the potential of solutions that consider both edge-dependent and global coefficients. In fact, the polynomial model with global coefficients suffers the most in this experiment.
|Node Variant (NV) Degree||74.77( 7.77)%|
|Node Variant (NV) S. Proxies||75.62( 8.19)%|
|Edge Variant (EV)||85.47(10.77)%|
|Hybrid EV (HEV) Degree||80.53(10.21)%|
|Hybrid EV (HEV) S. Proxies||75.37( 8.20)%|
We proposed a general framework that unifies state-of-the-art GCNN architectures into one recursion, named the edge-variant recursion. This unification highlighted the different tradeoff between the number of parameters and the amount of local detail that each approach adopts. Moreover, it shows rigorous ways to choose different tradeoffs and come up with a novel and ad hoc architecture for a problem at hand that is implemented locally in the vertex domain. We here proposed one, among many, extension and showed that it outperforms current solutions for graph signal classification tasks.
|Node Variant (NV) Degree||88.88( 2.62)%|
|Node Variant (NV) S. Proxies||86.12( 5.94)%|
|Edge Variant (EV)||89.00( 2.11)%|
|Hybrid EV (HEV) Degree||89.18( 1.99)%|
|Hybrid EV (HEV) S. Proxies||90.00( 1.21)%|
-  F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 61–80, Jan. 2009.
-  J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and deep locally connected networks on graphs,” arXiv:1312.6203v3 [cs.LG], 21 May 2014. [Online]. Available: http://arxiv.org/abs/1213.6203
-  A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proceedings of the IEEE, vol. 106(5), pp. 808–828, 2018.
-  T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in 5th Int. Conf. Learning Representations. Toulon, France: Assoc. Comput. Linguistics, 24-26 Apr. 2017.
-  M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Annu. Conf. Neural Inform. Process. Syst. 2016. Barcelona, Spain: NIPS Foundation, 5-10 Dec. 2016.
-  J. Du, S. Zhang, G. Wu, J. M. F. Moura, and S. Kar, “Topology adaptive graph convolutional networks,” arXiv:1710.10370v2 [cs.LG], 2 Nov. 2017. [Online]. Available: http://arxiv.org/abs/1710.10370v2
-  F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural network architectures for signals supported on graphs,” IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
-  R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “CayleyNets: Graph convolutional neural networks with complex rational spectral filters,” IEEE Trans. Signal Process., vol. 67(1), pp. 97–107, Jan. 2019.
-  F. M. Bianchi, D. Grattarola, C. Alippi, and L. Livi, “Graph neural networks with convolutional ARMA filters,” Feb. 2019. [Online]. Available: http://arxiv.org/abs/1901.01343
-  E. Isufi, A. Loukas, A. Simonetto, and G. Leus, “Autoregressive moving average graph filtering,” IEEE Trans. Signal Process., vol. 65, no. 2, pp. 274–288, Jan. 2017.
-  F. Gama, G. Leus, A. G. Marques, and A. Ribeiro, “Convolutional neural networks via node-varying graph filters,” in 2018 IEEE Data Sci. Workshop. Lausanne, Switzerland: IEEE, 4-6 June 2018, pp. 1–5.
-  S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and applications to distributed linear network operators,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4117–4131, Aug. 2017.
-  F. Gama, A. G. Marques, A. Ribeiro, and G. Leus, “MIMO graph filters for convolutional networks,” in 19th IEEE Int. Workshop Signal Process. Advances in Wireless Commun. Kalamata, Greece: IEEE, June 2018.
-  M. Coutino, E. Isufi, and G. Leus, “Distributed edge-variant graph filters,” in 2017 IEEE Int. Workshop Comput. Advances Multi-Sensor Adaptive Process. Curacao, Dutch Antilles: IEEE, 10-13 Dec. 2017.
-  ——, “Advances in distributed graph filtering,” arXiv:1808.03004v1 [eess.SP], 9 Aug. 2018. [Online]. Available: http://arxiv.org/abs/1808.03004
L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, “Median activation functions for graph neural networks,” in44th IEEE Int. Conf. Acoust., Speech and Signal Process. Brighton, UK: IEEE, 12-17 May 2019.
-  B. Wang, A. Pourshafeie, M. Zitnik, J. Zhu, C. Bustamante, S. Batzoglou, and J. Leskovec, “Network enhancement as a general method to denoise weighted biological networks,” Nature Communications, vol. 9, no. 3108, pp. 1–8, Aug. 2018.
-  M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” in
F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model CNNs,” in2017 IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recognition. Honolulu, HI: IEEE, July 2017.
-  J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in 30th Annu. Conf. Neural Inform. Process. Syst. Barcelona, Spain: NIPS Foundation, 5-10 Dec. 2016.
-  A. Anis, A. Gadde, and A. Ortega, “Efficient sampling set selection for bandlimited graph signals using graph spectral proxies,” IEEE Trans. Signal Process., vol. 64, no. 14, pp. 3775–3789, July 2016.
-  S. Segarra, M. Eisen, and A. Ribeiro, “Authorship attribution through function word adjacency networks,” IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5464–5478, Oct. 2015.