I Introduction
Given the rapidly increasing availability of data recorded on irregular domains, it would be extremely advantageous to analyse such unstructured data as signals on graphs and thus benefit from the ability of graphs to incorporate domainspecific knowledge. This has motivated the developments in the rapidly expanding field of Graph Signal Processing [Sandryhaila2013, Shuman2013, Sandryhaila2015, Ortega2018, Stankovic2019_1, Stankovic2019_2], and has spurred the introduction of the graph counterparts of many classical signal processing algorithms.
One such direction is that of the linear system on a graph, which was recently considered in [Sandryhaila2013, Eldar2017, Stankovic2019_2]. In classical signal processing, a system is a linear operator that maps an input signal to another (output) signal. The signal shift operator (unit time delay) is the lynchpin in discretetime linear systems, but its definition on graphs is not obvious due to the rich underlying connectivity structure. Topologically, the signal shift on a graph can be viewed as the movement of a signal sample from the considered vertex along all edges connected to this vertex. Therefore, to effectively introduce a system or a filter which operates on signals acquired on graphs, it is necessary to rigorously define and understand the graph shift operator (GSO), a subject of this work.
Our aim is therefore to explore a graphtheoretic framework for shift operators on a graph. While existing GSOs typically take the form of the graph adjacency or Laplacian matrices, for rigour we here introduce the shift operator from a probabilistic perspective. This is achieved based on the principle of maximum entropy, to make it possible to cater for even singlerealisations of random signals on a graph and to operate with a limited number of vertices.
Furthermore, the proposed GSO is shown to be bounded and even asymptotically power preserving, a desired property of asymptotically preserving the signal power over shifts with an increase in the number of edges and vertices. To reinforce the importance of the graph domain knowledge in this type of problem, we also prove that misspecified assumptions of the signal domain can prohibit the asymptotic norm preservation of the GSO. The practical utility of the proposed class of shift operators on a graph is demonstrated through a physically meaningful and intuitive realworld example of geographically distributed estimation of multisensor temperature measurements.
Ii Preliminaries
The signal domains considered in this work are graphs, for which we follow the notation employed in [Stankovic2019_1, Stankovic2019_2] whereby a graph is defined as a set of vertices, , which are connected by a set of edges, . The existence of an edge between vertices and is designated by .
The graph connectivity of an vertex graph can be formally represented by the adjacency matrix, , whereby the vertex connectivity is described by
(1) 
Regarding the directionality of vertex connections, a graph can be undirected or directed. A graph is undirected if each edge, , has its counterpart, , that is, . For directed graphs, in general this property does not hold.
The neighbourhood of a vertex , denoted by , is the set of vertices directly connected by an edge to vertex .
In general, the edges can also convey information about the relative importance of their connection through a weighted graph. The weight matrix, , corresponds morphologically to the set of edges, . A nonzero element in the weight matrix, , designates both the existence of an edge and the value of the corresponding weight, whereby the value indicates that there is no edge . In this sense, the adjacency matrix, , can be considered as a special case of the weight matrix, .
There are three classes of approaches to the definition of graph edges and their corresponding weights, [Stankovic2019_2]:

[label=)]

Physically well defined edges and weights, through domain knowledge;

Definition of edges and weights based on the geometry of vertex positions;

Data similarity based methods for learning the underlying graph topology.
The degree matrix, , is a diagonal matrix with elements which are equal to the sum of the weights of all edges connected to the vertex , that is, . The degree matrix quantifies the centrality of each vertex in a graph. For instance, for undirected and unweighted graphs, the degree element is equal to the number of edges connected to a vertex .
Iii Random Processes on a Graph
In order to define a general random graph model, we associate with each vertex, , a real i.i.d.random variable, . Our a priori knowledge of the graph is typically limited to the topological structure of the domain, which is reflected by the weight matrix, . We therefore consider the problem of establishing an appropriate stochastic model to describe a random graph signal, given the domain structure only.
In such situations of limited knowledge, it is natural to choose the model according to the maximum entropy principle [Jaynes1957], which asserts that the most suitable random process maximises entropy given the currently available knowledge. An implicit maximum entropy assumption is therefore that of statistical independence, that is, a random signal on a graph is conditionally independent of its predecessors.
Remark 1.
Similar to classical cases, it is important to notice that the properties of a random graph process are directly related to its shifted states. However, unlike the backward shift of a signal on a discretetime domain which maps the process to its translation , the shift of a signal on a graph defines the translation of the graph process, , at a vertex to the vertices in its direct neighorbood.
This condition can be mathematically expressed through
(2) 
where symbol denotes the graph backward shift operator by steps. Condition (2) asserts that the random process at a vertex is dependent only of the current state of its neighborhood , the socalled graph Markov property. As a result, the stochastic process which attains the maximum entropy is the Markovian random walk on a graph.
Markovian random walks exhibit a finite set of states, given by the vertex space, . For each pair of states, that is, for each edge
, there exists is a transition probability,
, of going from vertex to vertex , where for each vertex, , the transition probabilities sum up to unity, . The Markov matrix, , is then defined with its th element equal to . Notice that each row in sums up to unity, i.e. .In addition to the assumption of statistical independence, another maximum entropy assumption which is used here to define the graph shift operator is the Martingale property, which states that at a particular instant, the conditional expectation of the next value in a sequence, given all prior values, is equal to the present value, so called persistent estimation. The Martingale property for the graph shift operator, , then becomes
(3) 
Remark 2.
A random process on a graph is naturally described by a class of random walks which satisfy the Markov and Martingale properties; these have been widely studied in statistics ever since the seminal papers [Einstein1905, Einstein1906] formulated the theory of Brownian motion and diffusion processes.
Iv Shift operator on a Graph
To derive the shifted (expected) value of the random process at a vertex , it is necessary to employ the expectation operator, that is, a probabilistic weighting scheme of the form
(4) 
Remark 3.
In conventional signal processing, this expectation is typically implemented using the timeaverage operator, however, in many realworld situations we only encounter one realisation of the random process. To overcome this issue on a graph, we employ the conditional expectation, along with the Markov property condition in (2), to introduce the following shift (expectation) operator
(5) 
where is the expected value of the random variable at the vertex . Since we only encounter one realisation of the random process, i.e. , the desired graph shift (expectation) operator is in the form
(6) 
which can also be written in a matrix notation as
(7) 
The use of the Markov matrix as the shift operator was recently proposed in [Eldar2017], and the above analysis further justifies this concept. In the sequel, we will adopt the symbol to denote a shift operator on a graph.
In practice, the actual probabilities of vertex transition are often unknown. However, as is shown next we can infer these probabilities using the available information of the graph domain geometry, implied by the weight matrix, .
Iva General random walk model
A general random walk
(GRW) may be thought of as a discretetime stochastic process which at each step transitions to neighbouring vertices, according to a certain probability distribution. In the limit, Donsker’s theorem states that the GRW has a probability density which convergences to that of the
Wiener process [Donsker1951, Billingsley1999, Durrett1996, Revuz1999]. In the graph setting, for a walker at a vertex , the central limit theorem [Billingsley1995]asserts that after a sufficiently large number of independent steps, the walker’s position is Gaussian distributed,
, where is a measure of physical distance between vertices and . Consequently, GRW weight matrix, , also includes unit values on its diagonal, which indicate self connections (cf. in the standard case), to yield(8) 
Notice that in a probabilistic setting the vertices are implicitly selfconnected; to ensure that the transition probabilities sum up to unity, we need to normalise the GRW weights to obtain
(9) 
In this way, the graph shift matrix, , takes the form of the socalled diffusion matrix, [Coifman2006], and consequently the shift (expectation) operator for the GRW model becomes
(10) 
Remark 4.
We next investigate the power boundedness of the shift operator, a prerequisite to justify its use in real world applications. To this end, we embark upon the dual role of the graph shift as the graph expectation (see Remark 3), to examine the statistical consistency of the GSO based on the GRW model.
IvB Statistical consistency of GSO
Given the difficulty of evaluation of the statistical consistency for an arbitrary graph random process, we consider the GRW under the central limit theorem, that is, a Wiener process described by , which satisfies the desired Markov and Martingale properties.
IvB1 Bias
The expectation at the th vertex is given by
(12) 
The estimator is unbiased, since
(13) 
where the probabilities sum to unity, , .
IvB2 Asymptotic consistency
To evaluate the asymptotic consistency of the expectation operator on a graph, we begin by estimating the variance of the expected value in (
12), to obtain(14) 
Owing to the statistical independence assumption, the covariance between the random variables at different vertices vanishes, that is, for . The estimation variance then reduces to
(15) 
For asymptotic consistency, the evolution of the variance as the number of vertices increases behaves as
(16) 
A lower bound to the term can be obtained from the CauchySchwarz inequality, given by . For our setup, with and , the following bound follows
(17) 
or, equivalently,
(18) 
Therefore, with an increase in the number of vertices, , in the limit the lower bound on the estimation variance vanishes, since from (15)
(19) 
This proves that the expectation operator based on the GRW model is asymptotically consistent.
IvC Boundedness and power preservation of the proposed GSO
We next show that the statistical properties of the expectation operator translate to the boundedness properties of the dual shift operator. We begin by expressing the variance as
(20) 
In light of the shiftexpectation duality, , the variance of a random graph signal can be rewritten as
(21) 
Since the variance of a random process is nonnegative, , we directly obtain the power boundedness of the shift operator in the form
(22) 
which is a direct consequence of Jensen’s inequality. In other words, as desired the energy of the shifted graph signal is lower than or equal to the energy of the original graph signal.
It can also be proven that with an increase in the number of vertices, , the shift operator is asymptotically power preserving. Starting from (19) and (21), we can show that if the following asymptotic estimation variance
(23) 
vanishes, this yields the asymptotic behaviour
(24) 
which proves the asymptotic graph signal power preservation of the proposed shift operator.
IvD System for random graph signals
A linear system of order of a random graph signal is defined as follows [Sandryhaila2013]
(25) 
where are the system coefficients. Owing to the power boundedness and asymptotic power preservation properties of the proposed class of GSOs, , the class of systems based on this shift also exhibits the boundedness properties, that is
(26) 
IvE Lazy random walk
We next show that if the chosen model misspecifies the topology of the underlying graph, the associated shift operator, even if it is unbiased, does not asymptotically preserve the shifted signal power. For example, consider the wellknown lazy random walk (LRW) model which at each step:

Transitions to a neighbouring vertex with probability ;

Remains at the current vertex with probability .
If the transition probabilities are unknown, these can be inferred based on the graph topological information, which is typically given by the weight matrix and takes the form
(27) 
Note that, unlike for the GRW in (8), the vertices here are not selfconnected, i.e. . Upon normalising the weights so that the probability of moving to a neighbouring vertex sums up to , the LRW transition probability becomes
(28) 
and the graph shift matrix for the LRW takes the form
(29) 
The asymptotic analysis of the LRW expectation operator follows the analysis in Sections
IVB1–IVB2. Upon reformulating the LRW local expectation operator in terms of the estimate at the th vertex, we have
Comments
There are no comments yet.