on geometric deep learning emphasizes the need to develop deep learning tools for such datasets and even more importantly to understand the mathematical properties of these tools, in particular, their invariances.
They also mention two types of problems that may be addressed by such tools. The first problem is signal analysis on graphs with applications such as classification, prediction and inference on graphs. The second problem is learning the graph structure with applications such as graph clustering and graph matching.
Several recent works address the first problem [5, 6, 7, 8]. In these works, the filters of the networks are designed to be parametric functions of graph operators, such as the graph adjacency and Laplacian, and the parameters of those functions have to be trained.
The second problem is often explored with random graphs generated according to two common models: Erdős–Rényi, which is used for graph matching, and the Stochastic Block Model (SBM), which is used for community detection. Some recent graph neural networks have obtained state-of-the-art performance for graph matching  and community detection [10, 11] with synthetic data generated from the respective graph models. As above, the filters in these works are parametric functions of either the graph adjacency or Laplacian, where the parameters are trained.
Despite the impressive progress in developing graph neural networks for solving these two problems, the performance of these methods is poorly understood. Of main interest is their invariance or stability to basic signal and graph manipulations. In the Euclidean case, the stability of a convolutional neural network  to rigid transformations and deformations is best understood in view of the scattering transform 
. The scattering transform has a multilayer structure and uses wavelet filters to propagate signals. It can be viewed as a convolutional neural network where no training is required to design the filters. Training is only required for the classifiers given the transformed data. Nevertheless, there is freedom in the selection and design of the wavelets. The scattering transform is approximately invariant to translation and rotation. More precisely, under strong assumptions on the wavelet and scaling functions and as the coarsest scaleapproaches , the scattering transform becomes invariant to translations and rotations. Moreover, it is Lipschitz continuous with respect to smooth deformation. These properties are shown in  for signals in and , where is a compact Lie group.
It is interesting to note that the design of filters in existing graph neural networks is related to the design of wavelets on graphs in the signal processing literature. Indeed, the construction of wavelets on graphs use special operators on graphs such as the graph adjacency and Laplacian. As mentioned above, these operators are commonly used in graph neural networks. The earliest works on graph wavelets [14, 15] apply the normalized graph Laplacian to define the diffusion wavelets on graphs and use them to study multiresolution decomposition of graph signals. Hammond et al.  use the unnormalized graph Laplacian to define analogous graph wavelets and study properties of these wavelets such as reconstructibility and locality. One can easily construct a graph scattering transform by using any of these wavelets. A main question is whether this scattering transform enjoys the desired invariance and stability properties.
In this work, we use a special instance of the graph wavelets of  to form a graph scattering network and establish its covariance and approximate invariance to permutations and stability to graph manipulations. We also demonstrate the practical effectiveness of this transform in solving the two types of problems discussed above.
The rest of the paper is organized as follows. The scattering transform on graphs is defined in Section 2. Section 3 shows that the full scattering transform preserves the energy of the input signal. This section also provides an absolute bound on the energy decay rate of components of the transform at each layer. Section 4 proves the permutation covariance and approximate invariance of the graph scattering transform. It also briefly discusses previously suggested candidates for the notion of translation or localization on graphs and the possible covariance and approximate invariance of the scattering transform with respect to them. Furthermore, it clarifies why some special permutations are good substitutes for Euclidean rigid transformations. Section 5 establishes the stability of the scattering transform with respect to graph manipulations. Section 6 demonstrates competitive performance of the proposed graph neural network in solving the two types of problems.
2 Wavelet graph convolutional neural network
2.1 Wavelets on graphs
We review the wavelet construction of Hammond et al.  and adapt it to our setting. Our general theory applies to what we call simple graphs, that is, weighted, undirected and connected graphs with no self-loops. We remark that we may also address self-loops, but for simplicity we exclude them. Throughout the paper we fix an arbitrary simple graph with
vertices. We also consistently use uppercase boldface letters to denote matrices and lowercase boldface letters to denote vectors or vector-valued functions.
The weight matrix of is an symmetric matrix with zero diagonal, where denotes the weight assigned to the edge of G. The degree matrix of is an diagonal matrix with
The (unnormalized) Laplacian of is the matrix
The eigenvalues ofare non-negative and the smallest one is
. Since the graph is connected, the eigenspace of(that is, the kernel of ) has dimension one. It is spanned by a vector with equal nonzero entries for all vertices. This vector represents a signal of the lowest possible “frequency”.
The graph Laplacian is symmetric and can be represented as
where are the eigenvalues of ,
are the corresponding eigenvectors, anddenotes the conjugate transpose. We remark that the phases of the eigenvectors of and their order within any eigenspace of dimension larger than 1 can be arbitrarily chosen without affecting our theory for the graph scattering transform formulated below.
Let be a graph signal. Note that in our setting we can regard , and without further specification we shall consider
. We define the Fourier transformby
and the inverse Fourier transform by
Let denote the Hadamard product, that is, for , , , . Define the convolution of and in as the inverse Fourier transform of , that is,
When emphasizing the dependence of on the graph , we denote it by .
Euclidean wavelets use shift and scale in Euclidean space. For signals defined on graphs, which are discrete, the notions of translation and dilation need to be defined in the spectral domain. Hammond et al.  view as the spectral domain since it contains the eigenvalues of . Their procedure assumes a scaling function and a wavelet functions [17, 18] with corresponding Fourier transforms and . They have minimal assumptions on and . In our construction, we consider dyadic wavelets, that is,
Also, we fix a scale of coarsest resolution and assume that and can be constructed from multiresolution analysis, that is,
Note that and are both in . The graph wavelet coefficients of are defined by
2.2 Scattering on graphs
Our construction of convolutional neural networks on graphs is inspired by Mallat’s scattering network . As a feature extractor, the scattering transform defined in  is translation and rotation invariant when the coarsest scale approaches and the wavelet and scaling functions satisfy some strong admissibility conditions. It is also Lipschitz continuous with respect to small deformations. The neural network representation of the scattering transform is the scattering network. It has been successfully used in image and audio classification problems .
We form a scattering network on graphs in a similar way, while using the graph wavelets defined above and the following definitions. A path is an ordering of the scales of wavelets . The length of the path is . The length of an empty path is zero. For a path as above and a scale , we define the path as . For a vector we denote and note that the vectors and have the same norm.
For a scale , the one-step propagator is defined by
For , the scattering propagator is defined by
For the empty path, we define . We note that for any path and any scale
The windowed scattering transform for a path is defined by
Let denote the set of all paths of length , i.e., . The collection of all paths of finite length is denoted by . The scattering propagator and the scattering transform with respect to , which we denote by and respectively, are defined as
When we emphasize the dependence of the scattering propagator and transform on the graph , we denote them by and respectively. A naturally defined norm on and is
where denotes the -norm on .
In the terminology of deep learning, the scattering transform acts as a convolutional neural network on graphs. At the -th layer, where , the propagated signal is and the extracted feature is . This network is illustrated in Figure 1.
In a similar way, we can define the scattering transform for matrices of signals on graphs. Let , where for each , is a complex signal of length on the same underlying graph. We define
Note that is the Frobenious norm of the matrix . Here and throughout the rest of the paper we denote by the Frobenius norm of a matrix .
3 Energy preservation
We discuss the preservation of energy of a given signal by the scattering transform. The signal is either with the energy or with the energy . We first formulate our main result.
The scattering transform is norm preserving. That is, for or ,
The analog of Theorem 3.1 in the Euclidean case appears in [13, Theorem 2.6]. However, the proof is different for the graph case. One basic observation analogous to the Euclidean case is the following.
For and ,
This proposition can be rephrased as follows: the propagated energy at the -th layer splits into the propagated energy at the next layer and the output energy at the current layer. In order to conclude Theorem 3.1 from Proposition 3.2, we quantify the decay rate of the propagated energy, which may be of independent interest. Fast decay rate means that few layers are sufficient to extract most of the energy of the signal. We define the decay rate of the scattering transform at a given layer as follows.
For , and , the energy decay rate of at the -th layer is if
In practice, different choices of graph and scale lead to different energy decay rates. Nevertheless, we establish the following generic result that applies to all graph scattering transforms under the construction in Section 2.2.
The scattering transform has energy decay rate of at least at all layers but the first one. This is the sharpest generic decay rate, though a better one can be obtained with additional assumptions on , , and .
Note that in the Euclidean domain, no such generic result exists. Therefore, one has to choose the wavelets very carefully (see the admissibility condition in [13, Theorem 2.6]). Numerical results illustrating the energy decay in the Euclidean domain are given in . Furthermore, theoretical rates are provided in  and , where  introduces additional assumptions on the smoothness of input signals and the bandwidth of filters and  studies time-frequency frames instead of wavelets.
randomly selected images from the MNIST database. A graph that represents a grid of pixels shared by these images is used. Details of the graph and the dataset are described in Section 6.1. The figure reports the box plots of the cumulative percentage of the output energy of the scattering transform with for the first four layers and the 100 input images. That is, at layer the cumulative percentage for an image is
. We see that in the third layer, the scattering transform already extracts almost all the energy of the signal. Therefore, in practice we can estimate the graph scattering transform with a small number of layers, which is also evident in practice for the Euclidean scattering transform.
3.1 Proof of Proposition 3.2
Replacing with , summing over all paths with length and applying (14) yields
3.2 Proof of Proposition 3.3
Furthermore, we claim that
An improvement of this decay rate is possible if and only if one may strengthen the single inequality in (25) and the inequality in (26). We show that these inequalities can be equalities for special cases and thus the stated generic decay rate is sharp. We first note that equality occurs in the inequality of (25) if, for example, is the indicator function of . Equality occurs in the second inequality when has exactly two non-zero elements, for example, when . These two cases can be simultaneously satisfied. We comment that certain choices of , , and imply different inequalities with stronger decay rates.
3.3 Proof of Theorem 3.1
We write (21) as
and sum over , while recalling that , to obtain that
4 Permutation covariance and invariance
When applying a transformation to a graph signal it is natural to expect that relabeling the graph vertices and the corresponding signal’s indices before applying the transformation has the same effect as relabeling the corresponding indices after applying the transformation. More precisely, let be a permutation, where denotes the symmetric group on letters, then it is natural to ask whether
In deep learning, the property expressed in (32) is referred to as covariance to permutations. On the other hand, invariance to permutations means that
Ideally, a graph-based classifier should not be sensitive to “graph-consistent relabeling” of the signal coordinates. The analog of this ideal request in the Euclidean setting is that a classifier of signals defined on
should not be sensitive to their rigid transformations. In the case of classifying graph signals by first applying a feature-extracting transformation and then a standard classifier, this ideal request translates to permutation invariance of the initial transformation. However, permutation invariance is a very strong property that often contradicts the necessary permutation covariance. We show here that the scattering transform is permutation covariant and if the scaling function is sufficiently smooth andapproaches infinity, then it becomes permutation invariant.
We first exemplify the basic notions of covariance and invariance in Section 4.1. Section 4.2 reviews the few existing results on permutation covariance and invariance of graph neural networks and then presents our results for the scattering network. Section 4.3 explains why some permutations are natural generalizations of rigid transformations and then discusses previous broad generalizations of the notion of “graph translation” and their possible covariance and invariance properties. Sections 4.4 and 4.5 prove the main results formulated in Section 4.2.
4.1 Basic examples of graph permutations, covariance and invariance
For demonstration, we focus on the graph depicted in Figure 2(a). In this graph, each drawn edge has weight one and thus the double edge between the first two nodes has the total weight 2. The weight matrix of the graph is
The signal is depicted on the graph with different colors corresponding to different values. The following permutation is applied to the graph in Figure 2(b):
Figure 2(c) applies the permutation both to the signal and the graph.
An example of a transformation can be the replacement of the signal values in the two vertices connected by the edge of weight 2. This transformation is independent of the labeling of the graph and is thus permutation covariant. This can be formally verified as follows, while using for simplicity the permutation defined in (35). For a signal , . Furthermore, swaps the first two entries of a signal, while swaps the second and the fourth entries (the second claim is obvious from Figure 2(b)). Accordingly, and . One can readily check that indeed .
Another example is the transformation , where is the weight matrix in (34). This transformation is also independent of the labeling of the graph and thus permutation covariant. This property can also be formally verified as follows:
Similarly, , where is the graph Laplacian, is permutation covariant.
The above three examples of permutation covariant transformations are not permutation invariant. An example of a permutation invariant transformation , but not permutation covariant, maps the signal to the signal . Clearly the output is not affected by permutation of the input signal and is thus permutation invariant. On the other hand, zeroing out three specified signal coordinates, instead of three vertices with unique graph properties (e.g., the vertices connected by at least two edges), violates permutation covariance.
The latter example demonstrates in a very simplistic way the value of invariance for classification. Indeed, assume that there are two types of signals with low and high values and a classifier tries to distinguish between the two classes according to the first coordinate of by checking whether it is larger than a certain threshold or not. Then this procedure can distinguish the two types of signals without getting confused with signal relabeling. Permutation covariance does not play any role in this simplistic setting, since the classifier only considers the first coordinate of and ignores the rest of them.
4.2 Permutation covariance and invariance of graph neural networks
The recent works of Gilmer et al.  and Kondor et al.  discuss permutation covariance and invariance for composition schemes on graphs, where message passing is a special case. Composition schemes are covariant to permutations since they do not depend on any labeling of the graph vertices. Moreover, if the aggregation function of the composition scheme is invariant to permutations, so is the whole scheme [24, Proposition 2]. However, aggregation leads to loss of local information, which might weaken the performance of the scheme.
Methods based on graph operators, such as the graph adjacency, weight or Laplacian, are not invariant to permutations (see demonstration in Section 4.1). Nevertheless, the scattering transform is approximately permutation invariant when the wavelet scaling function is sufficiently smooth. Furthermore, when approaches infinity it becomes invariant to permutations. We first formulate its permutation covariance and then its approximate permutation invariance.
Let be a simple graph and be the graph scattering transform with respect to . For any and ,
Let be a simple graph and be the graph scattering transform with respect to . Assume that the Fourier transform of the scaling function of decays as follows: , where is a constant depending on . For any and
In particular, the scattering transform is invariant as approaches infinity. The result also holds if is replaced with and the Euclidean norm is replaced with the Frobenius norm.
4.3 Generalized graph translations
Permutation invariance on graphs is an important notion, which is motivated by concrete applications [25, 26]. It can be seen as an analog of translation invariance in Euclidean domains, which is also essential for applications [13, 12]. A different line of research asks for the most natural notion of translation on a graph [3, 27]. We show here that very special permutations of signals on graphs naturally generalize the notion of translation or rigid transformation of a signal in a Euclidean domain. More precisely, there is a planar representation of the graph on which the permutation acts like a rigid transformation. However, in general, there are many permutations that act very differently than translations or rigid transformations in a Euclidean domain, though, they still preserve the graph topology. Indeed, the underlying geometry of general graphs is richer than that of the Euclidean domain. We later discuss previously suggested generalized notions of “translations” on graphs and the possible covariance and invariance of a modified graph scattering transform with respect to these.
We first present two examples of permutations of graphs that can be viewed as Euclidean translations or rigid transformations. We later provide examples of permutations of the same graphs that are different than rigid transformations. The first example, demonstrated in Figure 3(a), shows a periodic lattice graph and signal with two values denoted by white and blue. Note that the periodic graph can be embedded in a torus, whereas the figure only shows the projection of its 25 vertices into a
grid in the plane. The edges are not depicted in the figure, but they connect points to their four nearest neighbors on the torus. That is, including “periodic padding” for thegrid of vertices, each vertex in the plane is connected with its four nearest neighbors. For example vertex 21 is connected with vertices 1, 16, 22, 25. The graph signal obtains a non-zero constant value on the four vertices colored in blue (3, 4, 7 and 8) and is zero on the rest of them. Figure 3(b) demonstrates an application of a permutation to both the graph and the signal. At last, Figure 3(c) depicts the permuted graph and signal of Figure 3(b) when the indices are rearranged so that the representation of the lattice in the plane is the same as that in Figure 3(a) (this is necessary as the lattice lives in the torus and may have more than one representation in the plane). The relation between the consistent representations of (,) in Figure 3(a) and (, ) in Figure 3(c) is obviously a translation. That is, graph and signal permutation in this example corresponds to translation. We remark that the fact that Figure 3(c) coincides with the description of (,) is incidental for this particular example and does not occur in the next example.
Figure 5 depicts a different example where a permutation of a graph signal can be viewed as a variant of a Euclidean rigid transformation. The graph and the signal are shown in Figure 5, where is supported on the vertices marked in blue (indexed by 1, 2 and 3). Figure 5 demonstrates an application of a permutation (mapping to ) to the graph and signal. Figure 5 shows a different representation of , which is consistent with the one of presented in Figure 5. The comparison between Figures 5 and 5 makes it clear that the graph and signal permutation corresponds to a Euclidean rigid transformation in the planar representation of the graph. At last, Figure 5 demonstrates that unlike the example in Figure 3(a), the rearrangement of is generally different than the graph . Indeed, the subgraph associated with the blue values of the signal is not a triangle and thus the topology is different.
We remark that many permutations on graphs do not act like translations or rigid transformations. We demonstrate this claim using the graphs of the previous two examples. In Figure 5(a), we consider the same graph as in Figure 3(a), but with a different permutation. The difference in permutations can be noticed by comparing the second columns of the grids in Figures 3(b) and 5(b). We note that the rearrangement of the vertices in Figure 5(c) does not yield an analog of a Euclidean translation. The reason is that the rearranged vertices do not form a grid. To demonstrate this claim, note that in Figure 5(a), label 17 is connected to 22, but in Figure 5(b), and consequently in the rearranged representation in Figure 5(c), they are disconnected.
Figure 6(a) demonstrates a permutation that does not act like a rigid transformation with respect to the graph of Figure 5. Clearly, the rearranged graph and signal in Figure 6(c) have different planar geometry. We remark that while the permutations demonstrated in Figures 5(a) and 6(a) do not preserve the planar geometry, they still preserve the topology of the graphs. Indeed, the notion of permutation invariance is richer than invariance to rigid transformations in the Euclidean domain.
In the signal processing community, some candidates were proposed for translating signals on graphs. Shuman et al.  defined a “graph translation” (or in retrospect, a graph localization procedure) as follows
They established useful localization properties of , which justify a corresponding construction of a windowed graph Fourier transform. They also demonstrated the applicability of this tool for the Minnesota road network [3, Figure 7]. We remark that in their definition may not be well-defined. To make it well-defined one needs to assume fixed choices of the phases of , , and that the algebraic multiplicities of all eigenvalues equal one.
Sandryhaila and Moura  define a “shift” of a graph signal by , where is the weight matrix of the graph. This definition is motivated by the example of a directed cyclic graph, where an application of the weight matrix is equivalent to a shift by one vertex. Note that in this special case, the graph signal permutation advocated in this section also results in a vertex shift. We remark that it is unclear to us why this notion of shift is useful for general graphs.
If one needs covariance and approximate invariance of a graph scattering transform to the graph localization procedure defined in , then one may modify the nonlinearity of the scattering transform as and redefine for . Note that
Therefore, the nonlinearity and the modified scattering transform are covariant to the localization operator . Similarly, by following the proof for Theorem 4.2 one can show that the modified scattering transform is approximately invariant to as long as its energy decay is sufficiently fast.
The scattering transform cannot be adjusted to be covariant and approximately invariant to the “shift” defined by . The reason is that unlike , does not commute in general with the eigenvectors .
4.4 Proof of Proposition 4.1
We need to show that for each path , where ,
Note that the Laplacian of is , which has the same eigenvalues as and has eigenvectors , . Equation (9) implies that for
Consequently, applying the absolute value pointwise,