# Modelling Graph Errors: Towards Robust Graph Signal Processing

The first step for any graph signal processing (GSP) procedure is to learn the graph signal representation, i.e., to capture the dependence structure of the data into an adjacency matrix. Indeed, the adjacency matrix is typically not known a priori and has to be learned. However, it is learned with errors. A little, if any, attention has been paid to modeling such errors in the adjacency matrix, and studying their effects on GSP methods. However, modeling errors in adjacency matrix will enable both to study the graph error effects in GSP and to develop robust GSP algorithms. In this paper, we therefore introduce practically justifiable graph error models. We also study, both analytically and in terms of simulations, the graph error effect on the performance of GSP methods based on the examples of more traditional different types of filtering of graph signals and less known independent component analysis (ICA) of graph signals (graph decorrelation).

## Authors

• 3 publications
• 11 publications
• 17 publications
08/21/2020

### Graph learning under spectral sparsity constraints

Graph inference plays an essential role in machine learning, pattern rec...
09/25/2018

### Graph filtering for data reduction and reconstruction

A novel approach is put forth that utilizes data similarity, quantified ...
12/11/2017

### Comparing Graph Spectra of Adjacency and Laplacian Matrices

Typically, graph structures are represented by one of three different ma...
03/10/2020

### Methods of Adaptive Signal Processing on Graphs Using Vertex-Time Autoregressive Models

The concept of a random process has been recently extended to graph sign...
01/10/2022

### Stratified Graph Spectra

In classic graph signal processing, given a real-valued graph signal, it...
11/25/2017

### On the Inverse of Forward Adjacency Matrix

During routine state space circuit analysis of an arbitrarily connected ...
04/05/2017

### Greedy Sampling of Graph Signals

Sampling is a fundamental topic in graph signal processing, having found...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In the classical signal processing setup where the digital signals are represented in terms of time series or vectors of spatial measurements (for example, measurements by sensor arrays), it is assumed that each point of a discrete signal depends on the preceding or spatially close point of the signal. With the current advances in the data collection and representation, when the data points may not be ordered temporally or spatially and the digital signal may no longer be accurately or even adequately represented by a simple time series structure, the classical signal processing tools are no longer applicable. Thus, graph signal processing (GSP)

[1, 2, 3] has emerged as a new area of signal processing and data analysis where one of the major aims is to generalize the standard signal processing methods and concepts into the context of more complex signals represented on graphs that can have various structures. The graph corresponding to ordered data is then just a simple special case of a digital signal graph. Examples of digital signals represented on graphs include sensor networks, brain networks, gene regulatory networks, and social networks to name just a few most frequently encountered signals [4, 5, 6, 7].

In GSP, each signal point is indexed by a node in a graph at which this signal point is measured, and edges between pairs of nodes in a graph indicate dependences between the corresponding signal points. Then the underlying graph can be fully specified by an adjacency matrix, denoted hereafter as , whose th element is nonzero if the th and the th nodes are connected, and the value

describes the strength of the relationship. Thus, the adjacency matrix can be viewed as the basis of GSP, on which for example the graph Fourier transform and graph filters

[1] are built, often via the graph Laplacian matrix or other shift operators which are defined based on the adjacency matrix.111The GSP literature is developing fast and by now covers also sampling theory for graph signals [8, 9], stationarity theory for graph signals [10], and percolation [11] of graph signals.

The process of obtaining the adjacency matrix may take different forms. There may exist physical connections between signal points known in advance, which directly yield a graph. At the other extreme end, there may be no auxiliary information available about the connectedness of signal points, and the methods of finding an adjacency matrix are then entirely based on training data. Methods for estimating the graph Laplacian matrix in the latter case have been developed in

[12, 13, 14]. Third, the choice of the adjacency matrix may be based on other variables, which are of no significance for describing particular values of the signal points and are of significance only for adjacency matrix learning. In both the second and the third cases, the process of adjacency matrix learning is necessarily stochastic. As a result, there are multiple sources of errors which lead to imperfect learning of adjacency matrix. Even in the case when the choice of adjacency matrix seems to be obvious and based on the known physical connections between signal points, such choice still may not be the best one.

Our focus in this paper is on the studies of the consequences of imperfect specifications of graph signal adjacency matrix to GSP tasks. The source of errors in the adjacency matrix learning may be related to or completely irrelevant to the sources of errors in the signal point measurements. However, even when the sources of errors in the adjacency matrix learning and signal point measurements are the same, the effects of imperfectly learned adjacency matrix on GSP tasks are different from those of caused by the errors in signal point measurement. To the best of our knowledge, this topic has not been studied intensively so far, but there are some existing works which touch upon it. In [15], a new centrality measure was introduced and the robustness with respect to edge weight perturbation was considered. Autoregressive moving average graph filters and finite impulse response graph filters for time-varying graph signals were compared in [16]. Among other scenarios, the case when some of the edges are missing had been studied there. The eigen-decomposition of the graph Laplacian was analyzed in [17] when small subsets of edges are added or deleted. In [18], robust graph signal processing approach was built on the results in [17]

and on the knowledge of edgewise probabilities of errors in the graph topology.

Naturally, the starting point for studying GSP performance under the condition of imperfect graph signal adjacency matrix knowledge is the modelling of graph errors. As the errors in the signal modelling in the classical signal processing may lead to significant performance degradations [19], the errors in the adjacency matrix knowledge may be expected to have a significant effect on the GSP performance. Then the development of robust GSP methods, just as the development of robust signal processing methods [19], is of a great importance. This paper, to the best of our knowledge, is the first comprehensive attempt towards developing justifiable and generic enough models for adjacency matrix mismatches. It also proceeds with the study of the effects of the mismatched adjacency matrix on the performance of some more traditional GSP applications such as filtering of graph signals as well as on the less known ICA of graph signals (graph decorrelation).

### I-a Contributions

Our main contributions are the following.222Our preliminary work on the topic has been reported in [20].

• We formulate graph error models for the adjacency matrix, which help to quantify the deviation from the true matrix using a few parameters. We consider adding and removing edges and perturbing the edge weights. First, we build models assuming equal probabilities of mislearning the edge status for each pair of nodes, but later we generalize to the case where there are several subsets of pairs such that within the groups the probabilities are equal, but they differ between the groups.

• We study the graph moving average signal model, and derive results for its covariance matrix and graph autocorrelation, both for a given true adjacency matrix and for one obtained from a graph error model.

• We illustrate which kind of effects different type of errors in adjacency matrix specification produce in filtering of graph signal and ICA of graph signals (graph decorrelation). Some theoretical studies as well as simulation-based and real data-based studies are performed.

### I-B Notation

We use boldface capital letters for matrices, boldface lowercase letters for vectors, and capital calligraphic letters for sets. The exceptions are which is the -dimensional vector full of ones, the matrix full of ones , and is a matrix of the same size as , such that , if and , if . The matrix is the identity matrix. The notations , , , , , , and

stand for the transpose, Hadamard product, Euclidian norm of a vector, trace of a matrix, probability, mathematical expectation, and variance, respectively. The notation

stands for Gaussian zero-mean distribution with variance .

### I-C Paper Organization

Graph error models are introduced for different types of graph signals in Section 2. Section 3 is devoted to establishing a result, which brings an analogy between Gaussian distribution and Erdös–Rényi model using graphons as graph generating functions. Section 4 studies the covariance matrix of graph moving average (GMA) signal including the case when the graph signal adjacency matrix is mismatched. GSP applications for which the effect of adjacency matrix mismatch is studied in the paper are discussed in Section 5, while Section 6 is devoted to simulation-based and real data-based studies of the adjacency matrix mismatch in the applications discussed in Section 5. Section 7 concludes the paper.

## Ii Graph Error Models

Let be a directed graph that represents the basis of a graph signal, where is the set of nodes and is the set of edges. Then the true but unknown adjacency matrix of the graph , denoted as , is a matrix that satisfies the conditions for and if and only if , i.e., the th node is connected to the th node.

For developing our graph error models, we will use the Erdös–Rényi model according to which a random graph is constructed by connecting nodes randomly with a constant probability [21]. The corresponding graph is denoted as and its adjacency matrix is a random matrix such that and for all , and for , where each element of the matrix is generated independently from the other elements.

An important characteristic of the Erdös-Rényi graph is that it does not allow for formations of communities [22], and if applied on the top of another graph, it will not change the essential structure of such graph, which can be described, for example, in terms of other kernel-based random graphs known as graphons [22, 23, 24]. Instead, it just disturbs the spectrum of the original graph. It depends then only on the probability value, whether the essential structures in a graph signal contaminated by Erdös-Rényi graph can be correctly captured.

Because of the errors in the adjacency matrix learning, the estimated adjacency matrix deviates from the true one. In the following subsections, we introduce graph error models which all share the same additive structure, i.e., have the form

 W=A+E (1)

where presents the estimated adjacency matrix, is the correct adjacency matrix, and is an unknown error matrix which can be viewed as an analog of the additive error/distortion/noise component (that applies not to the signal points, but to the signal structure) in the traditional signal processing and time series analysis. Moreover, the Erdös–Rényi graph is the basic GSP error/distortion/noise model analogous to the basic Gaussian noise model in the traditional signal processing as it will be shown in Section III.

### Ii-a Graph Error Model for Signals on Unweighted Directed Graphs

Consider first unweighted graphs, for which the adjacency matrix becomes where is some constant weight. Assuming that the outcome of the graph signal adjacency matrix learning is accurate enough, that is, assuming that all or sufficiently many essential structures of the graph signal are captured correctly and that incorrect graph edge learning is equally probable for any edge in the graph, the learning errors can be described by the Erdös-Rényi model. Then the actually available learned adjacency matrix of a graph signal can be modelled as the following inaccurate version of

 W=A+Δϵ⊙(a1N×N−2A). (M1)

According to (M1), the true adjacency matrix of a graph signal is distorted because of the imperfect learning, and this distortion follows the Erdös-Rényi model, where the level of distortion depends on a single parameter, which is the probability . Since the Erdös-Rényi graph applies on the top of another graph, as a result of distortion an edge can be added with probability , when there is no edge in the true graph, or an edge of the true graph can be removed with the same probability, if there exists an edge in the true graph. It corresponds to flipping the value from 0 to 1 or from 1 to 0 in the adjacency matrix in the positions corresponding to value 1 in the Erdös-Rényi adjacency matrix .

### Ii-B Generalization for Different Probabilities of Missed and Mislearned Edges

The basic model (M1) can be easily extended to the case where the probability of removing an edge as a result of distortion from a graph which correctly captures a graph signal, denoted as , is not the same as the probability of adding an edge, which does not exist in the true graph, denoted as . The corresponding inaccurately learned adjacency matrix of a graph signal can then be modelled as

 W=A−Δϵ1⊙A+Δϵ2⊙(a1N×N−A). (M2)

This is a trivial extension of the basic model (M1), which is useful for modelling the typical situation in the graph structure learning when particular algorithms are more or less likely to miss an edge rather than to mislearn an edge [12]. Model (M2) can be interpreted as an application of two Erdös–Rényi graphs on the top of the true graph, where one Erdös–Rényi graph can only add edges which do not exist in the true graph, while the other Erdös-Rényi graph can only remove existing edges. It is easy to see that model (M2) is equivalent to model (M1) when .

### Ii-C Graph Error Model for Signals on Weighted Directed Graphs

Further, model (M2) can be extended to signals on weighted graphs. Let be the set of nonzero elements of the true graph adjacency matrix . The inaccurately learned weighted adjacency matrix can be then modelled as

 W= A+(1N×N−Δϵ1)⊙1A⊙Σc−Δϵ1⊙A +Δϵ2⊙B⊙(1N×N−1A) (M3)

where is an matrix whose elements are drawn from a set with replacement and is an matrix whose elements are drawn from a zero mean Gaussian distribution with variance . Here is the sample variance of and is the variance multiplier.

Models (M1)–(M3) can be all revised in terms of undirected graphs by defining lower triangular matrices and analogously to and , and then replacing and by and , respectively.

### Ii-D Generalized Graph Error Model

It is assumed in models (M1)–(M3) that mislearning the correct edge status is equally probable (although the probabilities of adding non-existing edge and removing existing edge may be different) for all pairs of nodes. For some adjacency matrix estimation methods, this assumption might not hold strictly even if the pairwise connections were equally strong. To allow the differences in learning accuracy, caused by the structure of the graph or the connectivity strength differences, we formulate the following generalized graph error model.

Consider an unweighted adjacency matrix . Define , where are matrices satisfying for all and , and . Each matrix presents the pairs of nodes, indicated by ones, for which the probabilities of mislearning the existence of the edges are equal. The probabilities of removing edges in the subsets are given in , and the probabilities of adding edges are given in . The graph error model specified by , , and can then be written as

 W= A−K∑k=1Δϵk1⊙Dk⊙A +K∑k=1Δϵk2⊙Dk⊙(a1N×N−A). (M4)

For comparison with the above introduced graph error models, in Erdös-Rényi graph error model, for example, and .

For further insights into the generalized graph error model (II-D), note that in the stochastic block model [25], there is a partition of the nodes into communities . Let denote the sizes of the communities. Moreover, let us assume that the nodes are ordered so that the th node belongs to the th community if and only if . The probability of the edge existence from a node of the th community to a node of the th community is denoted as . Let denote an matrix such that if and , and otherwise, and define . Then, using the notations used in (II-D), we may for example set , , , and we have a model where the probability of mislearning an edge depends on which communities the start and end node belong to.

Another popular and practically important graph model is the planted partition model. It is obtained from the stochastic block model when for all and , and for any and . In this case, a natural graph error model related is derived using model (II-D) with , and . Therefore, model (II-D) is quite generic one for modelling mismatches in the graph topology of graph signals. The number of parameters in model (II-D) is however large. Thus, a significant amount of prior information is assumed in comparison to the basic Erdös-Rényi model (M1).

## Iii Analog of Central Limit Theorem for Graph Errors

In the previous section, the Erdös-Rényi graph was an important part of constructing the graph error models, which are stochastic in nature and are characterized only by few parameters such as probabilities of adding a non-existing or removing an existing edge. The latter fact that these models are characterized only by few parameters is then useful as it simplifies the error modelling significantly. However, the use of Erdös-Rényi model as the main block for describing the mismatches in the graphical structures of graph signals need to be further justified. Moreover, with respect to the basic nature of the Erdös-Rényi graph for modelling graph errors of graph signals, we would like to bring an analogy to the basic nature of the Gaussian additive noise for modelling errors related to imperfect fit between signal models and actual measured signal values. Similar to the fact that the key for noise modelling is the noise pdf, the key for graph errors model will be the graph function or graphon.

### Iii-a Graphon

The term graphon was introduced in [26], in the context of graph sequence theory, and it refers to measurable and symmetric functions mapping from to . Graphons have been used, for example, as graph limit objects [26, 24], as building blocks in the kernel graph model [27]. Moreover, they have been used for spectral analysis of the adjacency matrix [22].

Let us define the set of all graphons by , and denote its elements, i.e., graph functions, by . Measurability is a technical assumption which allows to integrate the function

, and symmetricity implies that the edges are undirected. Limits of sequences (when the number of nodes tends to infinity) of dense graphs (such graphs in which each node is connected to positive percent of the other nodes) cannot be easily defined using graphs with countable set of nodes, since for example, there is no uniform distribution on countably infinite sets. Hence, uncountable cardinality of nodes is needed in order to have a natural graph sequence limit object. In this context graphon

is interpreted as a weighted graph with uncountable set of nodes, giving the edge weight between nodes and .

Another role of graphons, on which we focus in this section, is as a kernel in the kernel graph model, which consists of the triple , where is the number of nodes, is a graphon (or kernel) and is a function/mapping from a probability space to the space .

A realization of a kernel graph is obtained by generating values from . Then the th and the th nodes are connected with probability independently from the other pairs. Thus, the graphon can be seen here as sort of a density function for the graph structure. Typically is chosen to have uniform density on .

Some variants of the kernel graph model have been discussed in the literature. For example, can be derived through a random process or can be taken as a deterministic function that gives sequence for with probability one. For example, the Erdös-Rényi model with probability parameter , which is of interest here, is obtained by choosing the constant graphon for all and for any choice of .

The stochastic block model with fixed community sizes is given by the determistic and satisfying when and where . When has the uniform distribution and , we have a stochastic block model with random community sizes, the expected sizes being .

In the kernel graph model, the expected degree of a random node is

 (N−1)∫10∫10W(x,y)dμ(x)dμ(y)

and the range of expected degrees is

 (N−1)∫10W(x,y)dμ(y), x∈[0,1].

While the stochastic block model allows for a very heterogeneous degree distribution with the extreme setup and , such construction is quite cumbersome. Therefore, it is of interest to use continuous graphons which only have few parameters, such as the exponential model [22] where the graphon is of the form

 W(x,y)=e−(β1(x+y)+β0)

with . Parameter controls here the sparsity of the graph. the larger the value of is, the more sparse the graph is. Moreover, increasing the parameter makes the degree distribution more heterogeneous.

### Iii-B Graphons for graph error models and analog of central limit theorem for graph errors

The kernel graph model for the undirected version of graph error model M2 can be built in terms of the graphon formalism using and piecewise constant graphon defined as follows.

Let be integers so that and . Then if , and otherwise.

The eigenvalues of large normalized and unweighted graph adjacency matrices can be then approximated in terms of the eigenvalues and eigenfunctions of the corresponding graphons

[24]. The graphon’s eigenvalues and eigenfunctions can be found by solving the equation

 λf(x)=∫10W(x,y)f(y)dy.

Due to symmetricity, any eigenvalue is real-valued. Examples of solving the eigenvalue/eigenfunction problem in the cases of Erdös-Rényi, stochastic block and exponential models can be forund in [22], where the behavior of the eigenvalues of finite graphs is also studied in terms of simulations. Thus, graphons have been proved to be useful in the analysis of graph sequences, but they are also convenient for formulating other theoretical results. Here, we are rather intrested to justify the use of Erdös-Rényi graphon as a basic model for modelling graph errors in graph signals.

Indeed, the main reason for using Gaussian random variables as error terms meant to model the discrepancies between the analytic signal model and actual measurements is the central limit theorem. In essence, this theorem states that the distribution of sum of independent and identically distributed random variables tends to the Gaussian distribution as the number of summands grows. The Erdös-Rényi graph plays a special role due to its simplicity. However, the importance of the Erdös-Rényi graph in GSP also follows from the fact that similar to the Gaussian distribution for measurement error modelling, the Erdös-Rényi graphon leads to a theorem that can be considered as an analog of central limit theorem in the case of modelling the graph structure errors of graph signals.

Let denote the set of all graphons satisfying

 ∫10∫10W(x,y)dxdy=ϵ

where . Then the following theorem is in order.

###### Theorem 1.

Let be random samples from and be random sample from a distribution with support on and expected value . Assume that are mutually independent. Then for all , we almost surely have

 ¯W(x,y)=1MM∑i=1ciWi(x,y)→cϵ% asM→∞.

Proof: Consider the random variable where and are fixed and is picked randomly from . First notice that the distribution is the same for all pairs , and therefore

. The result then follows by applying the law of large numbers.

Theorem 1 states that the average of many graphons converges to a constant, i.e., to the Erdös-Rényi graphon. The theorem thus further motivates the use of Erdös-Rényi graphs in the error model, when the summands are considered as noise factors which impede the estimation of the graph structure. In this sense, Theorem 1 is an analog of the central limit theorem.

## Iv Graph Moving Average Signals

In graph signal processing literature, the methods are mostly applied to real data or a signal with a desired graph frequency response is generated using inverse graph Fourier transform. However, moving average model for graph signals have been used at least in [7], where its properties were not studied, and in [28], where it was analyzed when the graph is undirected. After recalling the definition of graph moving average (GMA) model, we compute the covariance matrix of GMA signal and study how it changes when the adjacency matrix is perturbed. Then the concept of graph autocorrelation coefficient is discussed and analyzed in the case that the signal follows GMA(1) model and the shift matrix in graph autocorrelation comes from the graph error model (M2). GMA model and the graph autocorrelation matrix, the multivariate correspondent of the coefficient, are later used in the paper for independent component analysis.

### Iv-a GMA Model

The GMA signal model of order , denoted hereafter as GMA, is an extension for graph signals of the traditional time series moving average (MA) model. It is given as

 z=y+m∑l=1θlAly (2)

where with being mutually independent Gaussian random variables with zero mean and variance , and are MA coefficients.

The same GMA model can be written in infinitely many different forms, for example by changing the scales of ’s and accordingly, but also any GMA can be written as GMA with and adjacency matrix . However, even if the adjacency matrix in GMA presentation is unweighted, the adjacency matrix in the corresponding GMA presentation most likely is weighted.

Model (2) is generalization of the traditional time series MA model, as it is obtained when is the cycle graph which satisfies , if , and , otherwise.

Let us now derive some statistics of the graph signal given by (2). For simplicity and analytical tractability of the later studies, we limit our study here mostly for the GMA model given as

 z=y+θAy=~Ay (3)

where . The extension of some results for the generic model (2) is straightforward.

Note that the value of the th node of the graph signal in (3) is given by

 zi=yi+θ∑j∈Niaijyj (4)

where denotes the set of incoming neighbors of node . Thus, if , two nodes are correlated if they are neighbors or if they have shared incoming neighbors.

### Iv-B Covariance Matrix of GMA Signals

We start by deriving the covariance matrix of the graph signal given by the GMA model, when there is no mismatch of the adjacency matrix, and extend it then to the cases of a general GMA model as well as mismatched adjacency matrix.

The covariance matrix of the graph signal (3) as a function of can be expressed as

 Cz(~A)≜E{zz⊤}=E{~Ayy⊤~A⊤}=~AE{yy⊤}~A⊤.

Using the fact that are uncorrelated, i.e., , and substituting the expression for , the covariance matrix can be written as a function of the adjacency matrix as

 Cz(A)=σ2y~A~A⊤=σ2y(IN×N+θ(A+A⊤)+θ2AA⊤). (5)

The covariance matrix for the general GMA in (2) can be obtained straightforwardly by substituting the expression instead of into (5).

It is important to note that for graph signals represented on unweighted graphs, that is when the entries of take only zero and one values, the diagonal elements of the matrix in (5) give the numbers of incoming neighbors for the nodes. Moreover, the th element of for , i.e., , is the number of mutual incoming neighbors of the th and the th nodes.

Now we derive the covariance matrix of the graph signal (3) with inaccurately learned graph adjacency matrix modelled as in (1) by simply substituting the mismatched adjacency matrix instead of the actual one . Then the covariance matrix as a function of can be expressed as

 Cz(W)=σ2y(IN×N+θ(W+W⊤)+θ2WW⊤). (6)

Using (1), the covariance matrix (6) can be now rewritten as a function of the true adjacency matrix and the mismatch matrix , which is assumed to be fixed, but can be modelled in general as in (M1)–(II-D). Then we get the conditional covariance matrix

 Cz(A,E) =Cz(A)+σ2yθ(E+E⊤+θ(AE⊤+EA⊤+EE⊤)) =Cz(A)+Cz(E) (7) + σ2y(θ2(AE⊤+EA⊤)−IN×N).

It can be seen from (IV-B) that the covariance matrix of the graph signal (3) with inaccurately learned is not only the summation of the covariance matrices of and , but contains also the third cross-correlation component . In this cross-correlation term, the scaling factors are the signal variance and the square of MA(1) coefficient , while the term requires further explanations.

Example 1: Cross-correlation term for unweighted error graphs.

Let us consider as an example the unweighted graph error model (suach as (M1) and (M2)) when elements of also take only zero and one values. Then the th diagonal element of the matrix , i.e., , is the two times the number of erroneously removed incoming edges to the th node with a negative sign. While the th (for ) off-diagonal element of the matrix , i.e., , is given as follows. From the total number of incoming neighbors of both the th and th nodes in the mismatched adjacency matrix , which are also incoming neighbors of either the th or the th node in the true , we should subtract the number of incoming neighbors of both the th and th nodes in , which are not the incoming neighbors of both the th and th nodes in , and also subtract the number of incoming neighbors of both the th and th nodes in , which are incoming neighbors of neither the th nor the th nodes in .

### Iv-C Graph Autocovariance of GMA Signals

In GSP tasks such as filtering or signal recovery from samples, smoothness with respect to the chosen shift matrix is a key assumption. Therefore, smoothness is used also in the shift operator learning from training data, as for example in [13]. Graph autocorrelation is a measure of smoothness which we will discuss here, since in the next section in the context of independent component analysis, we deal with a related concept for multivariate signals, graph autocorrelation matrix.

We start with the autocovariance of a graph signal. Let us first define the centering matrix . The matrix is symmetric and it satisfies the property . The graph signal autocovariance of lag with respect to is then defined as

 sz,k(W) ≜E{1N−k(Hz)⊤WkHz} =1N−kE{z⊤HWkHz}. (8)

Substituting the graph signal model (3) into (IV-C), we obtain

 sz,k(W) =1N−kE{y⊤~A⊤HWkH~Ay} =1N−ktr{HWkHCz(A)} =σ2yN−ktr{(H+θA⊤H)Wk(H+θHA)}

where the property has also been used. Thus, graph autocovariance is a weighted sum of the covariances between the nodes of a graph signal.

### Iv-D Graph Autocorrelation of GMA Signals

The graph autocorrelation of lag with respect to matrix is given as

 rz,k(W)=E{(Hz)⊤WkHz}(E{∥Hz∥2}E{∥HWkz∥2})1/2. (9)

It is of interest to see how the graph autocorrelation (9) depends on the specific parameters of random graph error model, for example, parameters and of model (M2). It can be seen that the graph autocorrelation is also a function of the graph adjacency matrix . Thus, let us also assume that , i.e., also follows the Erdös-Rényi model with the probability parameter . Then for graph autocorrelation of the GMA(1) signal (3), we would like to derive the expected value, when both and are considered random, that is, the expected value for the lag . Even though this expected value is an average over Erdös-Rényi graphs , we have verified by extensive simulations that the graph autocorrelation values are very similar for any specific and fixed realization of . The expected graph autocorrelation value can be then expressed in terms of the essential parameters only, such as the GMA coefficient , the probability parameter in the graph model , and for example, the parameters and in the graph error model in (M2). The expected value can be used for example to verify, that GMA signal is smoother with respect to its adjacency matrix than the mismatched adjacency matrix. This kind of study is conducted also in terms of simulations in Section VI.

Example 2: Expected value of the graph autocorrelation for GMA(1) and graph error model (M2).

Let us start with rescaling the matrices and and the variance so that

 1NE{∥Hz∥2}=1 1NE{∥HAz∥2}=1 (10) 1NE{∥HWz∥2}=1.

The nonzero values of and are denoted by and , respectively. For given values of , , and , the only unknowns in (IV-D) are , and . Hence we can find , and by simply solving the system of three equations (IV-D). Notice that only the third equation includes , which implies that and can be found from the first two equations. However, the expectations in (IV-D) need to be calculated first. For calculating the expected values in (IV-D), the fact that if and otherwise can be used together with the assumption that the elements of the matrices , , and are independent within each matrix as well as between the matrices.

Let us compute here the expectation in the first equation in (IV-D). The computations of the expectations in the other two equations go in the same steps. We start by substituting (3) (or equivalently (4)) into the expectation in the first equation in (IV-D), and opening the square

 E{∥Hz∥2}=E{∥H(θAy+y)∥2} = E⎧⎨⎩N∑i=1(θN∑k=1aikyk+yi−1NN∑j=1(θN∑l=1ajlyl−yj))2⎫⎬⎭ = E{N∑i=1(θN∑k=1aikyk+yi−1NN∑j=1(θN∑l=1ajlyl−yj)) × ⎛⎝θN∑k′=1aiky′k+yi−1NN∑j′=1(θN∑l′=1aj′l′y′l−y′j)⎞⎠⎫⎬⎭.

Then after some algebraic manipulations of (IV-D), we obtain that

 E{∥Hz∥2}= = E{N∑i=1(θ2N∑k=1a2iky2k−2NN∑j=1(θ2N∑k=1aikajky2k−θaijy2j) +y2i−2NN∑j=1(θajiy2i−yiyj)+θ2N2N∑j=1,j′=1,l=1ajlaj′ly2l +2θN2N∑j=1,l=1ajly2l+1N2N∑j=1y2j⎞⎠⎞⎠⎫⎬⎭ = σ2y(N2θ2αa2−2Nθ2αa2−2N2θ2α2a2−2Nθαa+N − 2Nθαa−2+Nθ2αa2+N2θ2α2a2+2Nθαa+1). (12)

Dropping the negligible terms in (IV-D), i.e., the zero-order terms of and first-order terms of that also include , the first equation in (IV-D) can be approximated as

 1NE{∥Hz∥2}≈σ2y((α−α2)θ2a2N−2αθa+1). (13)

The derivations of the expected values in the other two equations in (IV-D) are more tedious and a lot lengthier, but otherwise follow the same steps as the derivation of (13). Thus, we skip these derivations here for the sake of brevity, and write down the solution for the system of equations (IV-D). The parameter can be found from only the last equation in (IV-D) as

 w= (σ2(−α3N2θ2a2(ϵ1+ϵ2−1)2 + α2N2θ2a2(1−ϵ1+ϵ2(2(ϵ1+ϵ2)−3)) + αN2θ2a2(ϵ2−ϵ22) + α2N(θ2a2(ϵ1+ϵ2−1)−(ϵ1+ϵ2−1)2) + αN(1−ϵ1+ϵ2(2(ϵ1+ϵ2)−θ2a2−3)) + N(ϵ2−ϵ22)+α(ϵ1+ϵ2−1)−ϵ2))−1/2.

and the parameters and can be found by solving the system of remaining two equations

 {σ2y((α−α2)θ2a2N−2αθa+1)=1σ2y((α2−α3)θ2a4N2−2α2θa3N+(α−α2)a2N)=1

Finally, we find for , the closed-form approximate expression for the expected autocorrelation value of the graph signal as a function of and as

 EA,W,z{rz,1(W)}≈−wϵ2 +σ2y(θawN(α−α2)−αw)(1−ϵ1−ϵ2). (14)

## V Applications

### V-a GMA Filter

As in any other graph signal processing task, the adjacency matrix has a key role in graph filtering. In fact, any linear and shift-invariant filter (commutes with other shift-invariant filters) can be written as a polynomial of the form

 (15)

where is smaller than or equal to the degree of the minimal polynomial of . Hence, it is clear that the errors in , which is in (15) instead of , will affect the GMA filter. Let us see how significant these effects can be by checking the frequency response of the filter (15).

Towards this end, let us first recall the graph Fourier transform which is based on the following Jordan decomposition of the mismatched adjacency matrix now

 W=VJV−1.

The graph Fourier transform matrix is then and the graph Fourier transform of the signal is

 ^x=Fx.

If has distinct complex eigenvalues , called graph frequencies, they have a natural ordering [29] given by their distance to their maximum magnitude, i.e., . The shorter the distance, the lower is the corresponding graph frequency. This ordering can be used when constructing graph filters by solving the following system of equations with unknowns in the least-squares (LS) sense:

 h0+h1λ0+⋯+hLλL0 =α0 ⋮ (16) h0+h1λN−1+⋯+hLλLN−1 =αN−1.

In the system of equations (V-A), are the unknowns, and the frequency responses are chosen so that the filter has the desired properties. For example, a high-pass filter is obtained by choosing

 αn={1,  if  dn>median{d0,…,dN−1}0,   otherwise (17)

which attenuates the low frequency components.

Note that the coefficients in (15) are determined using the eigenvalues of by solving (V-A) in LS sense. Thus, the mismatched can significantly effect the GMA filter. Indeed, different powers (from 0 to ) of the eigenvalues of appear as coefficient in (V-A). Thus, the difference between the largest and smallest coefficients in each equation in (V-A) can be quite significant. Moreover, the difference between coefficients in (V-A) for the case of true compared to the case of mismatched can be also significant. As a result, the difference between the filter coefficients found based on the true and the mismatched can be dramatically different [30]. For example, let us see how the eigenvalues of change as compared to eigenvalues of when is large. For simplicity, let the true adjacency matrix be unweighted and follow the stochastic block model with communities. Let us model the mismatched adjacency matrix according to the graph error model (M2). Then also follows the stochastic block model and major changes may occur only in the largest eigenvalues. Since the matrix of coefficient in the system of linear equations (V-A) is a Vandermonde matrix, the LS solution of (V-A) is sensitive to changes in the eigenvalues . It means that the GMA filter design can be very sensitive to the errors in the adjacency matrix , and therefore robust GMA filter design methods are of high importance. In Section VI-B, we study the sensitivity of the choice of adjacency matrix, when the high-pass filter defined by (17) is applied to detecting malfunctioning sensors.

### V-B GARMA Filter

First order graph autoregressive moving average (GARMA) filter [31] has coefficients , and , and uses some graph Laplacian matrix . We will here use translated normalized Laplacian

 L=−T−1/2WT−1/2 (18)

where is a diagonal matrix of node degrees and is symmetric mismatched adjacency matrix. Note that the eigenvalues of are between , but as in the GMA filter case, they can be very different for the cases of the true and mismatched adjacency matrices.

The filter output is given by , where is the input and is converged state of recursion . For example, the frequency response of the GARMA(1) filter is

 H(λ)=c+ψ1−λϕ

where are the filter parameters.

Note that the GARMA filter of order can be constructed from parallel GARMA(1) filters, whose coefficients are designed using an algorithm based on Shanks’ method [32]. Then the output of an GARMA(K) filter is with and the frequency response is

 H(λ)=c+K∑k=1ψ(k)1−λϕ(k).

Again, the errors in the adjacency matrix have a significant influence on the filter directly via , but also via distorting the spectrum of which then results in errors in the estimates of the filter parameters . The errors then can refelect even on the GARMA filter stability. In Section VI-D, we illustrate the performance loss of the GARMA filter due to errors in the Laplacian matrix in (18).

### V-C ICA of Graph Signals

After introducing the graph signal error models and the GMA signal model, our objective is to investigate the graph error effect on the performance of GSP tasks. In this section, we take ICA of a mixture of GMA signals using the GraDe method [7] as an example of a GSP task.

Let denote -dimensional graph signal generated as a mixture of independent components according to the model

 X=ΩZ+μ1⊤N

where is a full rank mixing matrix, is the matrix of mutually independent graph signals with zero means and unit variances, and is the location vector. The ICA goal is to estimate the unmixing matrix using only the data matrix .

Let be the whitened data, where is the sample covariance matrix of and is the vector of row means of . In GraDe, the unmixing matrix estimate is obtained by diagonalizing/jointly diagonalizing one or more graph-autocorrelation matrices given as

 ^Sk(W)=1N−k(XwWkX⊤w),  k=1,…,K

i.e., by finding the orthogonal which maximizes the objective function

 K∑k=1∥diag(U^Sk(W)U⊤)∥2.

The unmixing matrix estimate is then . A fast algorithm for the joint diagonalization is available in [33] and applicable for the case when the shift matrix is chosen to be symmetric, or the graph-autocorrelation matrices are symmetrized. The unmixing matrix estimate for an inaccurately learned adjacency matrix is denoted as . Notice that GraDe reduces to the well-known second-order blind identification (SOBI) estimator [34], when is the cycle graph.

We will use the following (see [35] for details) asymptotic result, derived in the context of the SOBI estimator, for an unmixing matrix estimate obtained using joint diagonalization of matrices . When , for , we have

 √N(^γii−1)=−12√N([^S0]ii−1)+op(1)

and

 √N^γij=∑k(λki−λkj)(√N[^Sk]ij−λki√N[^S0]ij)∑k(λki−λkj)2 +op(1)

where , and stands for negligible terms. The diagonal elements of do not depend asymptotically on , and thus, in the case of graph signals ICA, do not depend on . Therefore, the sum of the off-diagonal elements

 SOV(^Γ(W))=N∑j≠ivar(^Γ(W)ij