# A Unifying Analysis of Shift Operators on a Graph

The maximum entropy principle is employed to introduce a general class of shift operators (GSO) for random signals on a graph. By virtue of the assumed probabilistic framework, the proposed GSO is shown to be both bounded and to exhibit the desired property of asymptotic power preservation over graph shifts. For rigour, the sensitivity of the GSO to a graph topology misspecification is also addressed. The advantages of the proposed operator are demonstrated in a real-world multi-sensor signal average setting.

## Authors

• 9 publications
• 17 publications
• 11 publications
• 9 publications
• 29 publications
09/12/2019

### Unitary Shift Operators on a Graph

A unitary shift operator (GSO) for signals on a graph is introduced, whi...
11/03/2017

### A mathematical framework for graph signal processing of time-varying signals

We propose a general framework from which to understand the design of fi...
08/28/2021

### How likely is a random graph shift-enabled?

The shift-enabled property of an underlying graph is essential in design...
01/25/2021

### Learning Parametrised Graph Shift Operators

In many domains data is currently represented as graphs and therefore, t...
03/10/2020

### Methods of Adaptive Signal Processing on Graphs Using Vertex-Time Autoregressive Models

The concept of a random process has been recently extended to graph sign...
06/19/2021

### Graph approximation and generalized Tikhonov regularization for signal deblurring

Given a compact linear operator , the (pseudo) inverse ^† is usually sub...
07/26/2021

### A Unifying Framework for Testing Shape Restrictions

This paper makes the following original contributions. First, we develop...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Given the rapidly increasing availability of data recorded on irregular domains, it would be extremely advantageous to analyse such unstructured data as signals on graphs and thus benefit from the ability of graphs to incorporate domain-specific knowledge. This has motivated the developments in the rapidly expanding field of Graph Signal Processing [Sandryhaila2013, Shuman2013, Sandryhaila2015, Ortega2018, Stankovic2019_1, Stankovic2019_2], and has spurred the introduction of the graph counterparts of many classical signal processing algorithms.

One such direction is that of the linear system on a graph, which was recently considered in [Sandryhaila2013, Eldar2017, Stankovic2019_2]. In classical signal processing, a system is a linear operator that maps an input signal to another (output) signal. The signal shift operator (unit time delay) is the lynchpin in discrete-time linear systems, but its definition on graphs is not obvious due to the rich underlying connectivity structure. Topologically, the signal shift on a graph can be viewed as the movement of a signal sample from the considered vertex along all edges connected to this vertex. Therefore, to effectively introduce a system or a filter which operates on signals acquired on graphs, it is necessary to rigorously define and understand the graph shift operator (GSO), a subject of this work.

Our aim is therefore to explore a graph-theoretic framework for shift operators on a graph. While existing GSOs typically take the form of the graph adjacency or Laplacian matrices, for rigour we here introduce the shift operator from a probabilistic perspective. This is achieved based on the principle of maximum entropy, to make it possible to cater for even single-realisations of random signals on a graph and to operate with a limited number of vertices.

Furthermore, the proposed GSO is shown to be bounded and even asymptotically power preserving, a desired property of asymptotically preserving the signal power over shifts with an increase in the number of edges and vertices. To reinforce the importance of the graph domain knowledge in this type of problem, we also prove that misspecified assumptions of the signal domain can prohibit the asymptotic norm preservation of the GSO. The practical utility of the proposed class of shift operators on a graph is demonstrated through a physically meaningful and intuitive real-world example of geographically distributed estimation of multi-sensor temperature measurements.

## Ii Preliminaries

The signal domains considered in this work are graphs, for which we follow the notation employed in [Stankovic2019_1, Stankovic2019_2] whereby a graph is defined as a set of vertices, , which are connected by a set of edges, . The existence of an edge between vertices and is designated by .

The graph connectivity of an -vertex graph can be formally represented by the adjacency matrix, , whereby the vertex connectivity is described by

 Amn={1,(m,n)∈E,0,(m,n)∉E. (1)

Regarding the directionality of vertex connections, a graph can be undirected or directed. A graph is undirected if each edge, , has its counterpart, , that is, . For directed graphs, in general this property does not hold.

The neighbourhood of a vertex , denoted by , is the set of vertices directly connected by an edge to vertex .

In general, the edges can also convey information about the relative importance of their connection through a weighted graph. The weight matrix, , corresponds morphologically to the set of edges, . A non-zero element in the weight matrix, , designates both the existence of an edge and the value of the corresponding weight, whereby the value indicates that there is no edge . In this sense, the adjacency matrix, , can be considered as a special case of the weight matrix, .

There are three classes of approaches to the definition of graph edges and their corresponding weights, [Stankovic2019_2]:

1. [label=)]

2. Physically well defined edges and weights, through domain knowledge;

3. Definition of edges and weights based on the geometry of vertex positions;

4. Data similarity based methods for learning the underlying graph topology.

The degree matrix, , is a diagonal matrix with elements which are equal to the sum of the weights of all edges connected to the vertex , that is, . The degree matrix quantifies the centrality of each vertex in a graph. For instance, for undirected and unweighted graphs, the degree element is equal to the number of edges connected to a vertex .

## Iii Random Processes on a Graph

In order to define a general random graph model, we associate with each vertex, , a real i.i.d.random variable, . Our a priori knowledge of the graph is typically limited to the topological structure of the domain, which is reflected by the weight matrix, . We therefore consider the problem of establishing an appropriate stochastic model to describe a random graph signal, given the domain structure only.

In such situations of limited knowledge, it is natural to choose the model according to the maximum entropy principle [Jaynes1957], which asserts that the most suitable random process maximises entropy given the currently available knowledge. An implicit maximum entropy assumption is therefore that of statistical independence, that is, a random signal on a graph is conditionally independent of its predecessors.

###### Remark 1.

Similar to classical cases, it is important to notice that the properties of a random graph process are directly related to its shifted states. However, unlike the backward shift of a signal on a discrete-time domain which maps the process to its translation , the shift of a signal on a graph defines the translation of the graph process, , at a vertex to the vertices in its direct neighorbood.

This condition can be mathematically expressed through

 P(x(n)∣∣∣⋂τ>0Sτ(x(n)))=P(x(n)∣∣∣⋂m∈Vnx(m)) (2)

where symbol denotes the graph backward shift operator by steps. Condition (2) asserts that the random process at a vertex is dependent only of the current state of its neighborhood , the so-called graph Markov property. As a result, the stochastic process which attains the maximum entropy is the Markovian random walk on a graph.

Markovian random walks exhibit a finite set of states, given by the vertex space, . For each pair of states, that is, for each edge

, there exists is a transition probability,

, of going from vertex to vertex , where for each vertex, , the transition probabilities sum up to unity, . The Markov matrix, , is then defined with its -th element equal to . Notice that each row in sums up to unity, i.e. .

In addition to the assumption of statistical independence, another maximum entropy assumption which is used here to define the graph shift operator is the Martingale property, which states that at a particular instant, the conditional expectation of the next value in a sequence, given all prior values, is equal to the present value, so called persistent estimation. The Martingale property for the graph shift operator, , then becomes

 E{x(n)∣∣∣⋂τ>0Sτ(x(n))}=S(x(n)) (3)
###### Remark 2.

A random process on a graph is naturally described by a class of random walks which satisfy the Markov and Martingale properties; these have been widely studied in statistics ever since the seminal papers [Einstein1905, Einstein1906] formulated the theory of Brownian motion and diffusion processes.

## Iv Shift operator on a Graph

To derive the shifted (expected) value of the random process at a vertex , it is necessary to employ the expectation operator, that is, a probabilistic weighting scheme of the form

 S(x(n))≡\expectx(n)=∑m∈Vnx(m)P(m) (4)
###### Remark 3.

Observe from (3)-(4) that, for the considered class of random signals on graph, the graph shift operator exhibits a dual role of the graph expectation operator.

In conventional signal processing, this expectation is typically implemented using the time-average operator, however, in many real-world situations we only encounter one realisation of the random process. To overcome this issue on a graph, we employ the conditional expectation, along with the Markov property condition in (2), to introduce the following shift (expectation) operator

 S(x(n))≡\expectx|n=N∑m=1\expectx|mP(m|n) (5)

where is the expected value of the random variable at the vertex . Since we only encounter one realisation of the random process, i.e. , the desired graph shift (expectation) operator is in the form

 S(x(n))≡\expectx(n)=N∑m=1x(m)P(m|n) (6)

which can also be written in a matrix notation as

 \lx@sectionsign\x≡\expect\x=\lx@paragraphsign\x (7)

The use of the Markov matrix as the shift operator was recently proposed in [Eldar2017], and the above analysis further justifies this concept. In the sequel, we will adopt the symbol to denote a shift operator on a graph.

In practice, the actual probabilities of vertex transition are often unknown. However, as is shown next we can infer these probabilities using the available information of the graph domain geometry, implied by the weight matrix, .

### Iv-a General random walk model

A general random walk

(GRW) may be thought of as a discrete-time stochastic process which at each step transitions to neighbouring vertices, according to a certain probability distribution. In the limit, Donsker’s theorem states that the GRW has a probability density which convergences to that of the

Wiener process [Donsker1951, Billingsley1999, Durrett1996, Revuz1999]. In the graph setting, for a walker at a vertex , the central limit theorem [Billingsley1995]

asserts that after a sufficiently large number of independent steps, the walker’s position is Gaussian distributed,

, where is a measure of physical distance between vertices and . Consequently, GRW weight matrix, , also includes unit values on its diagonal, which indicate self connections (cf. in the standard case), to yield

 ~Wmn=⎧⎪⎨⎪⎩e−r2mn,(m,n)∈E,1,m=n,0,(m,n)∉E. (8)

Notice that in a probabilistic setting the vertices are implicitly self-connected; to ensure that the transition probabilities sum up to unity, we need to normalise the GRW weights to obtain

 P(m|n)=~Wnm~Dnn (9)

In this way, the graph shift matrix, , takes the form of the so-called diffusion matrix, [Coifman2006], and consequently the shift (expectation) operator for the GRW model becomes

 \lx@sectionsign=\lx@paragraphsign=~\D−1~\W (10)

###### Remark 4.

The standard weight matrix, , has zeros on the diagonal so that for in (8), and . Thus, the GSO in (10) in the standard notation becomes

 \lx@sectionsign=(\I+\D)−1(\I+\W) (11)

We next investigate the power boundedness of the shift operator, a prerequisite to justify its use in real world applications. To this end, we embark upon the dual role of the graph shift as the graph expectation (see Remark 3), to examine the statistical consistency of the GSO based on the GRW model.

### Iv-B Statistical consistency of GSO

Given the difficulty of evaluation of the statistical consistency for an arbitrary graph random process, we consider the GRW under the central limit theorem, that is, a Wiener process described by , which satisfies the desired Markov and Martingale properties.

#### Iv-B1 Bias

The expectation at the -th vertex is given by

 \expectx(n)=N∑m=1x(m)P(m|n) (12)

The estimator is unbiased, since

 \expectx(n) =N∑m=1\expectx(m)P(m|n)=μN∑m=1P(m|n)=μ (13)

where the probabilities sum to unity, , .

#### Iv-B2 Asymptotic consistency

To evaluate the asymptotic consistency of the expectation operator on a graph, we begin by estimating the variance of the expected value in (

12), to obtain

 \varx(n) =N∑m=1N∑k=1\covx(m),x(k)P(m|n)P(k|n) (14)

Owing to the statistical independence assumption, the covariance between the random variables at different vertices vanishes, that is, for . The estimation variance then reduces to

 \varx(n) =N∑m=1\varx(m)P2(m|n)=σ2N∑m=1P2(m|n) (15)

For asymptotic consistency, the evolution of the variance as the number of vertices increases behaves as

 limN→∞\varx(n)=limN→∞σ2N∑m=1P2(m|n) (16)

A lower bound to the term can be obtained from the Cauchy-Schwarz inequality, given by . For our setup, with and , the following bound follows

 (N∑m=1P(m|n))2≤(N∑m=1P2(m|n))(N∑m=112) (17)

or, equivalently,

 1N≤N∑m=1P2(m|n) (18)

Therefore, with an increase in the number of vertices, , in the limit the lower bound on the estimation variance vanishes, since from (15)

 limN→∞\varx(n) =limN→∞σ2N∑m=1P2(m|n)≥limN→∞σ2N=0 (19)

This proves that the expectation operator based on the GRW model is asymptotically consistent.

### Iv-C Boundedness and power preservation of the proposed GSO

We next show that the statistical properties of the expectation operator translate to the -boundedness properties of the dual shift operator. We begin by expressing the variance as

 \varx(n)=\expect|x(n)|2−|\expectx(n)|2 (20)

In light of the shift-expectation duality, , the variance of a random graph signal can be rewritten as

 \varx(n)=\expect|x(n)|2−|S(x(n))|2 (21)

Since the variance of a random process is non-negative, , we directly obtain the power boundedness of the shift operator in the form

 |S(x(n))|2≤\expect|x(n)|2 (22)

which is a direct consequence of Jensen’s inequality. In other words, as desired the energy of the shifted graph signal is lower than or equal to the energy of the original graph signal.

It can also be proven that with an increase in the number of vertices, , the shift operator is asymptotically power preserving. Starting from (19) and (21), we can show that if the following asymptotic estimation variance

 limN→∞\varx(n)=limN→∞(\expect|x(n)|2−|S(x(n))|2) (23)

vanishes, this yields the asymptotic behaviour

 limN→∞|S(x(n))|2=limN→∞\expect|x(n)|2 (24)

which proves the asymptotic graph signal power preservation of the proposed shift operator.

### Iv-D System for random graph signals

A linear system of order of a random graph signal is defined as follows [Sandryhaila2013]

 \y=M∑m=0hm\lx@sectionsignm\x (25)

where are the system coefficients. Owing to the power boundedness and asymptotic power preservation properties of the proposed class of GSOs, , the class of systems based on this shift also exhibits the boundedness properties, that is

 \expect∥\y∥2≤M∑m=0|hm|2\expect∥\x∥2 (26)

### Iv-E Lazy random walk

We next show that if the chosen model misspecifies the topology of the underlying graph, the associated shift operator, even if it is unbiased, does not asymptotically preserve the shifted signal power. For example, consider the well-known lazy random walk (LRW) model which at each step:

• Transitions to a neighbouring vertex with probability ;

• Remains at the current vertex with probability .

If the transition probabilities are unknown, these can be inferred based on the graph topological information, which is typically given by the weight matrix and takes the form

 Wmn={e−r2mn,(m,n)∈E,0,otherwise. (27)

Note that, unlike for the GRW in (8), the vertices here are not self-connected, i.e. . Upon normalising the weights so that the probability of moving to a neighbouring vertex sums up to , the LRW transition probability becomes

 P(m|n)=⎧⎨⎩12,m=n,12WnmDnn,m≠n, (28)

and the graph shift matrix for the LRW takes the form

 \lx@sectionsign=\lx@paragraphsign=12(\I+\D−1\W) (29)

The asymptotic analysis of the LRW expectation operator follows the analysis in Sections

IV-B1IV-B2. Upon reformulating the LRW local expectation operator in terms of the estimate at the -th vertex, we have