    # Smoothing graph signals via random spanning forests

Another facet of the elegant link between random processes on graphs and Laplacian-based numerical linear algebra is uncovered: based on random spanning forests, novel Monte-Carlo estimators for graph signal smoothing are proposed. These random forests are sampled efficiently via a variant of Wilson's algorithm –in time linear in the number of edges. The theoretical variance of the proposed estimators are analyzed, and their application to several problems are considered, such as Tikhonov denoising of graph signals or semi-supervised learning for node classification on graphs.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Tikhonov denoising of graph signals [1, 2], semi-supervised learning for node classification in graphs , post-sampling graph signal reconstruction  are a few examples falling in the following class of optimization problems. Given a graph signal where is the value measured at node of a graph, one considers the problem

 ^x=argminz∈Rnq||y−z||2+ztLz, (1)

where is the Laplacian of the graph and a parameter tuning the trade-off between a data-fidelity term –encouraging the solution to be close to – and a smoothness term –encouraging the solution to vary slowly along any path of the graph. In Tikhonov denoising, is a noisy measurement of an underlying signal that one wants to recover. In semi-supervised learning (SSL),

are known labels that one wants to propagate on the graph to classify all the nodes in different classes (see Section

4 for more details).

This optimization problem admits an explicit solution:

 ^x=KywithK=q(qI+L)−1∈Rn×n, (2)

that requires the inversion of a regularized Laplacian , where

is the identity matrix. Computing

costs elementary operations and becomes prohibitive as increases.

State-of-the-art. For large (say ), the state-of-the-art includes iterative methods such as conjugate gradient with preconditioning  where

is only accessed via matrix-vector multiplications of the form

, and polynomial approximation methods  that approximate by a low-order polynomial in . Both classes of methods enable to compute in time linear in , the number of edges of the graph.

Contribution. We introduce two Monte-Carlo estimators of based on random spanning forests, that:

• show another facet of the elegant link between random processes on graphs and Laplacian-based numerical linear algebra, such as in [7, 8, 9]

• scale linearly in and thus useful on very large graphs

• can be implemented in a fully distributed fashion: as long as each node is able to communicate with its neighbours, the result can be obtained without centralized knowledge of the graph’s structure (see the implementation paragraph at the end of Section 3).

The Julia code implementing these estimators and reproducing this papers’ results is available on the authors’ website.

Structure of the paper. We provide the necessary notations and background in Section 2. In Section 3, we detail the two novel estimators before generalizing them to cases where is of the form with a positive diagonal matrix. The experimental Section 4 gives an illustration on image denoising before showing how to use our estimators in the context of SSL. We conclude in Section 5.

## 2 Background

Notations and preliminary definitions. We consider undirected graphs where is the set of nodes, the set of edges, and the symmetric weighted adjacency matrix. We denote by the degree of node , the degree vector, and the diagonal degree matrix. In this paper, we consider the Laplacian matrix defined as . is positive semi-definite (PSD) 

and its eigenvalues can be ordered as

. We also consider graph signals defined over the nodes : is the signal’s value at node . We denote by the constant vector equal to 1, and by the Dirac centered on : if , and otherwise.

A tree of is understood as a connected subgraph of that does not contain cycles. A rooted tree of , i.e., a tree of whose edges are all oriented towards one node called root, is generically denoted by . A rooted forest of , i.e., a set of disjoint rooted trees, is generically denoted by . In the following, we only consider rooted trees and rooted forests, and thus simply refer to them as trees and forests. Also, stands for the set of roots of the trees in . As such, is the total number of trees in . We define the function such that is the root of the tree containing node in the forest . Node is said to be rooted in .

Random spanning forests (RSFs). Let us recall that a spanning tree (resp. forest) of is a tree (resp. forest) of that spans all nodes in . Now, for a given graph , there exists in general many spanning trees (and even more spanning forests). Probabilists and computer scientists have been studying several distributions over spanning trees and forests in the past. A classical distribution over spanning trees is:

 P(T=τ)∝∏(ij)∈τWij. (3)

Sampling from this distribution yields a so-called uniform222

this terminology comes from the fact that in unweighted graphs, this distribution reduces to the uniform distribution over all spanning trees

spanning tree (UST) , and can be efficiently performed by Wilson’s algorithm  via loop-erased random walks.

We focus here on the following parametrized distribution over spanning forests:

 P(Φq=ϕ)∝q|ρ(ϕ)|∏τ∈ϕ∏(ij)∈τWij,q∈R+∗. (4)

Sampling from this distribution yields a RSF and is efficiently performed (in time ) via a variant of Wilson’s algorithm . Many properties of are known [11, 12]

. For instance, the probability that node

is rooted in is (see lemma 3.3 in )

 P(rΦq(i)=j)=Kij. (5)

## 3 RSF-based estimators

Given a signal , our goal is to compute without computing explicitly , that is, without inverting . We define two Monte-Carlo estimators of , both based on RSFs.

A first estimator of . Let us consider a realisation of RSF and define the estimator as

 ∀i~x(i)=y[rΦq(i)]. (6)
###### Proposition 1.

is unbiased: . Moreover:

 E(||~x−^x||2)=∑i\emph{var}(~x(i))=yt(I−K2)y.
###### Proof.

Let . One has, using Eq. (5):

 E[~x(i)] =E[y[rΦq(i)]]=∑jP(rΦq(i)=j)yj (7) =∑jKijyj=δtiKy=^x(i). (8)

is thus unbiased. A similar calculation yields where , such that the variance verifies:

 var(~x(i))=E[~x(i)2]−E[~x(i)]2=δtiKy(2)−(δtiKy)2.

Summing over all nodes yields:

 ∑ivar(~x(i))=1tKy(2)−ytK2y.

Noticing that

is an eigenvector of

with eigenvalue (as is an eigenvector of with eigenvalue ) ends the proof. ∎

An improved estimator of . A random forest contains trees that we enumerate from to . Consider the subset of nodes that are in the -th tree. As is a spanning forest, is a partition of (i.e., a set of disjoint subsets that cover ). Let be the tree membership function that associates to each node the tree number to which it belongs. For instance, is the size of the tree containing node . We define a second estimator as

 ∀i¯x(i)=1|Vt(i)|∑j∈Vt(i)yj. (9)
###### Proposition 2.

is unbiased: . Moreover:

 E(||¯x−^x||2)=∑i%\emphvar(¯x(i))=yt(K−K2)y.
###### Proof.

Let us denote by the function that takes as entry a forest and outputs its associated partition. Let us also define if , and otherwise. We need the Proposition 2.3 of . Fixing a partition of , one has:

 P(rΦq(i)=j|π(Φq)=P)=sij|Vt(i)|.

In words, this means that given a partition, the distribution of the root within each part is uniform over . Also, note that can be written:

 Kij =∑PP(rΦq(i)=j|π(Φq)=P)P(π(Φq)=P) =∑Psij|Vt(i)|P(π(Φq)=P) =∑Psij|Vt(i)|∑ϕ s.t. π(ϕ)=PP(Φq=ϕ) =∑ϕP(Φq=ϕ)sij|Vt(i)|=E(sij|Vt(i)|). (10)

Given a RSF , define the matrix . Eq. (10) translates as: . Thus, :

 E[¯x(i)] (11) =E[δtiSy]=δtiE[S]y=δtiKy=^xi. (12)

The estimator is thus unbiased. Note that one also has:

 (13)

such that

 ∑ivar(¯x(i)) =∑iytE[StδiδtiS]y−(δtiKy)2 (14) =yt[E[StS]−K2]y. (15)

Note that , i.e., , finishing the proof. ∎

Rao-Blackwellisation. Note that is a Rao-Blackwellisation of where the sufficient statistic is the partition induced by the RSF. As such, is necessarily an improvement over . This improvement can also be observed in the variance equations of both propositions: as is PSD with eigenvalues between and , one has: .

Tikhonov denoising. Let   be a noisy measurement of a signal that one wishes to recover. Assuming that the measurement noise is Gaussian with covariance , one can write (for instance for the second estimator):

 Eϵ[E(||¯x−^x||2)] =xt(K−K2)x+γ2Tr(K−K2) =||¯Fx||22+γ2Tr(¯F2),

where is a graph bandpass filter [1, 13] with frequency response . The second term depends on the noise level . The first term, however, depends on the original signal filtered by : the Fourier components of associated with graph frequencies around (maximizing ) are thus harder to denoise than the ones close to or .

In practice. For a given , independent RSFs are sampled –in time . Each RSF provides an independent estimate of , and all estimates are finally averaged.

Generalisation. Instead of estimating results of the form , one may need to estimate results of the form where is a diagonal matrix, with . In order to tackle this case, one considers the following distribution over forests:

 P(ΦQ=ϕ)∝∏r∈ρ(ϕ)qr∏τ∈ϕ∏(ij)∈τWij, (16)

that generalizes the distribution of Eq. (4). One can efficiently sample from this distribution –also via a variant of Wilson’s algorithm (see the next paragraph). The introduced estimators generalize naturally to this case. In fact, given a RSF , their formulation is exactly the same, the sole difference stemming from the distribution from which is drawn from. Propositions 1 and 2 are still correct (proofs are omitted due to lack of space) for instead of .

Implementation. In a nutshell, an algorithm sampling from the distribution of Eq. (16) i/ adds a node to the graph and connects it to each node with weight ; ii/ runs Wilson’s RandomTreeWithRoot algorithm (based on loop-erased random walks –see Figure 1 of ) on this augmented graph to sample a UST rooted in ; iii/ cuts the edges connected to in yielding a RSF . Then, the estimator associates to each node the value of at its root , whereas the estimator associates to each node the average of over all the nodes belonging to its tree. All these operations can be done in a distributed fashion: no centralized knowledge of the graph is needed. Also, once the RSF is sampled, the computations involved for the estimators are not only distributed but can also be made in parallel (within each tree of the forest). To give an order of magnitude of computation times, for a random vector and , our naive Julia implementation of on a graph with (resp. ) nodes and (resp ) edges runs in average in 35 (resp. 550) ms on a laptop. These times are to compare to the optimized built-in sparse matrix multiplication running in 6 (resp. 115) ms, which is the building block of both conjugate gradient and polynomial approximation methods stated in the introduction. Our methods are thus comparable to the state-of-the-art in computation time. Figure 1: Illustration on an image. A grayscale image x is considered as a graph signal on an unweighted 2D grid graph where each pixel is a node connected to its four immediate neighbours. y=x+ϵ is a noisy measurement of x (ϵ is Gaussian with covariance matrix γ2I). ^x=q(qI+L)−1y is the exact Tikhonov denoised signal (here with q=1) that we try to estimate. Bottom line: the two left images show estimates of ^x obtained with the RSF-based estimators ~x and ¯x detailed in Section 3. Averaging over N=20forest realisations, one obtains the two images on the right. Finally, the top-right figure is the Peak Signal-to-Noise Ratio (PSNR) of the denoised images (averaged over 100 realisations of ϵ). As usual in these scenarios, there exists an optimal regularization parameter (a value of q maximizing the PSNR) that is here observed to be between 1 and 2. Figure 2: Illustration on SSL. Performance of the community recovery in a SBM with two equal-size classes, vs. the number of pre-labeled nodes m in each class. Results are averaged over 10 realisations of SBM. Left: setting with strong communities. Right: setting with fuzzy communities.

## 4 Experiments

Illustration on an image. Fig. 1 shows an image denoising example on a grayscale image. Constant irregular patches are observed on the realisation of : they are the trees of the associated RSF realisation. Also, as expected, converges faster than (as increases) for all values of .

Illustration on SSL. The goal of SSL is to infer the label (or class) of all the nodes of a graph given a few pre-labelled nodes. Consider a partial labeling of the nodes, where is the number of classes and is equal to if node is a priori known to belong to class and otherwise. The objective is to find classification functions such that each is on the one hand close to the labeling function and on the other hand smooth on the graph, with a trade-off given by a parameter . Depending on the choice of Laplacian used to define this graph smoothness (in the following, corresponds to the combinatorial Laplacian, to the normalized Laplacian, to the random walk Laplacian), the explicit formulation of can be written as (Prop. 2.2 of ): Note that this can be re-written, with and , as:

 ∀l=1,…,kfl=D1−σKDσ−1yl.

Finally, once all classification functions are computed, each node is classified in the class .

One may use our estimators to solve the SSL classification task: , i/ use the proposed estimators on the vector to estimate , ii/ left-multiply the result by and obtain an estimate of , iii/ once all functions have been estimated, classify each node to . In the following, we choose and set .

We illustrate this on the Stochastic Block Model (SBM): a random graph model with communities. Consider a SBM with nodes and two communities of equal size. We generate two scenarios: a well-structured (resp. fuzzy) setting with probability of intra-block connection (resp. ) and inter-block connection , which corresponds to a sparse graph with average degree (resp. ). The following experiment is performed: i/ choose nodes randomly in each community to serve as a priori knowledge on the labels and use them to define the two label functions ; ii/ compute the two classification functions either via direct computation (method referred to as ) or via our proposed estimators; iii/ each node is classified in community . The performance of the community recovery is measured by the Adjusted Rand Index (ARI), a number between and : the larger it is, the more accurate the recovery of the ground truth. Results are shown in Fig. 2. The estimator matches the performance of after forest realizations. Also, the smaller the amount of prior knowledge and the fuzzier the block structure, the harder it is to match . A closer look at the sampled forests shows that some trees do not contain any labeled nodes, thus failing to propagate the label information. This proof-of-concept could be improved in various ways to avoid this difficulty –going beyond the scope of this paper.

## 5 Conclusion

We provide an original and scalable method enabling to estimate graph smoothing operations in large graphs. In future work, we will further explore this deep link between RSFs and Laplacian-based numerical linear algebra, to apply these ideas to more advanced graph signal processing operations.