 # On the Diffusion Geometry of Graph Laplacians and Applications

We study directed, weighted graphs G=(V,E) and consider the (not necessarily symmetric) averaging operator (Lu)(i) = -∑_j ∼_ ip_ij (u(j) - u(i)), where p_ij are normalized edge weights. Given a vertex i ∈ V, we define the diffusion distance to a set B ⊂ V as the smallest number of steps d_B(i) ∈N required for half of all random walks started in i and moving randomly with respect to the weights p_ij to visit B within d_B(i) steps. Our main result is that the eigenfunctions interact nicely with this notion of distance. In particular, if u satisfies Lu = λ u on V and B = { i ∈ V: - ε≤ u(i) ≤ε}≠∅, then, for all i ∈ V, d_B(i) ( 1/|1-λ|) ≥( |u(i)| /u_L^∞) - (1/2 + ε). d_B(i) is a remarkably good approximation of |u| in the sense of having very high correlation. The result implies that the classical one-dimensional spectral embedding preserves particular aspects of geometry in the presence of clustered data. We also give a continuous variant of the result which has a connection to the hot spots conjecture.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction and main result

### 1.1. Introduction.

This paper is motivated by the following continuous problem: let be a compact set with smooth boundary and consider an eigenfunction of the Laplacian

 −Δu=λu

with Dirichlet boundary conditions on . A natural question (also for more general elliptic equations) is whether it is possible to specify in advance and using only the geometry of where the maximum is located. A recent result of the second and third author  shows the sharp result that the location of the maximum is at least away from the boundary when the domain is simply connected. Here is a universal constant, see [3, 19] for a detailed discussion. If we consider Neumann conditions on the boundary , then it follows from the hot spots conjecture of Rauch  that both maximum and minimum of the eigenfunction should be assumed on the boundary. This conjecture is known 

to fail in general but is widely believed to be true at least for convex domains. The purpose of our paper is to prove an optimal inequality for a related problem on graphs. We derive a certain type of guarantee that spectral clustering is well-behaved in the presence of clusters. Furthermore, we prove a continuous version of our result wherein we show that the location of both maximum and minimum of the first nontrivial Neumann eigenfunction for Laplace’s equation is not too close to the nodal line

.

### 1.2. Setup

Let be a connected, directed, weighted graph with normalized edge weights

 n∑j=1pij=1.

We introduce a Laplacian-type operator acting on functions as

 (Lu)(i)=−n∑j=1pij(u(j)−u(i)).

Note that, at this level of generality, the operator

need not be self-adjoint. This formulation is motivated by the mean-value property of the Laplacian in the continuous setting, the negative sign ensures that the operator is positive definite. We additionally assume, to avoid nontrivial counterexamples, that a random walk (with transition probabilities

) can ultimately travel from every vertex to every other vertex. There are naturally associated eigenvalues

which satisfy

 Lu=λu on V

It is easy to see (using either the Perron-Frobenius theorem or the Gershgorin circle theorem) that . Our main statement is that if is large, then is ‘far away’ from the set of vertices where is small, where the notion of distance is defined below.

### 1.3. Diffusion distance to the boundary.

Let

denote a random walk associated with a Markov chain on the graph with transition probabilities given by

 P(x(t+1)=j ∣∣ x(t)=i)=pij.

For given the diffusion distance is defined as the smallest integer such that the likelihood of a random walk started in is visiting within time steps is atleast

 dB(i)=inf{k∈N:P(∃ k′

Our setup implies that the diffusion distance is always finite. For example, consider the standard random walk on and let (Figure 1). We see that the diffusion distance has ‘quadratic growth away from the boundary’ due to the fact that a random walk on only travels up to distance in steps.

The notion is uniquely suited for graphs: if a graph is a fine discretization of a convex domain , then the diffusion distance scales like a rescaling of the squared distance to the boundary and this scaling is independent of the dimension (except for constants), however, a general graph may not have constant ‘dimensionality’ and the diffusion distance is naturally adaptive. In the continuous case, it is also highly related to the notions of capacity and harmonic measure.

### 1.4. Spectral embedding

Let us assume that the graph essentially decomposes into two clusters of roughly equal size connected via a bottleneck (see Fig. 2). Classical intuition suggests that the first (nontrivial) eigenfunction (associated to the largest real eigenvalue) of will be negative on one cluster, positive on the other cluster and in the bottleneck – this intuition has been made precise in a variety of different ways (most famously in Cheeger’s inequality). In particular, the map

 u:V→R

can be understood as a classifier: for any element

, the sign of allows to determine the cluster which contains . Or, put differently, is effective in isolating this basic geometric feature. However, one would naturally like to go further and argue that, while the sign of determines the cluster, the magnitude

should be able to serve as a quantitative measure of certainty of that estimate. In particular, the value

for which assumes its minimum should be the most typical representitive of its cluster that is most easily distinguished from elements in the other cluster (and similarly for the vertex in which assumes the maximum).

Example. Before stating the main result, we illustrate this notion by giving an example: we take all handwritten digits 0 and 1 from the MNIST data set. Figure 3 shows a spectral embedding into two dimensions: we have selected 8 specific points in the embedding and plot them in the corresponding order below. As can be observed, both digits are highly clustered and there is very thin bottleneck (comprised of samples of ‘0’). The samples with the smallest and largest coordinate in the embedding are both far away from the bottleneck. These samples are the form of these digits that is least likely to be misclassified. We observe that if one writes a ‘0’ in a very narrow way, there is a chance of it looking a lot like a ‘1’. The left-most digit ‘0’ with a little twirl on top is guaranteed to be a ‘0’ because the twirl could not be explained by someone writing a ‘1’. Likewise, the farthest digit ‘1’, written at a 45 degree angle, is likely not a narrow ‘0’. Figure 3. A spectral embedding into two dimensions clearly reveals two clusters (above). 8 of these points are marked (below) and we observe that both the extremal cases are far away from the bottleneck (and represent the most ‘typical’ examples if 0’s and 1’s).

### 1.5. Main result.

We now state the main result of the paper: vertices at which an eigenvector of the Laplacian assumes large values cannot be close to the set where the eigenfunction is small.

###### Theorem 1.

Suppose and that is so large that

 B={i∈V:−ε≤u(i)≤ε}≠∅.

Then, for all ,

 dB(i)log(1|1−λ|)≥log(|u(i)|∥u∥L∞)−log(12+ε).

Put differently, vertices for which is not too small are far away from the bottleneck

for which the spectral embedding vector is ambiguous. Some remarks are in order.

Remarks.

1. The result is sharp for (see below for an example).

2. There are obvious connections to the notion of eigenvector centrality which proposes that the importance of a point in a network can be measured by the

. Moreover, according to this heuristic, the point at which the maximum is assumed is a good candidate for the most ‘central’ point in the network. Our result implies that more ‘central’ points with respect to this notion are located deep inside their respective cluster.

3. We observe that seems to be a remarkably good approximation of : indeed, we believe that understanding the precise relationship between the two objects could be of significant interest. This naturally relates to the continuous case, where the mean first exit time gives rise to the Filoche-Mayboroda landscape function [1, 10, 21]. Our notion of diffusion time may be understood as median first exit time.

4. The constant is not special: one could generalize the diffusion distance by requiring that of all Brownian motions have visited ; this gives rise to a different inequality with replaced by .

5. The result is not restricted to the first eigenvector. Note that while the set depends on the eigenvalue, its applicability is not restricted to graphs having exactly two clusters.

### 1.6. Absorbing states.

The purpose of this section is to give a related result in the special case of the random walk having absorbing states: assume the edge weights of are such that there is an absorbing set of vertices with the property that every random walk gets eventually absorbed with likelihood 1. Then the first eigenfunction of is intimately linked to absorbtion time (and, more generally, the diffusion distance and the first eigenfunction seem to have very strong correlation, see §5).

###### Theorem 2.

Suppose and , then

 d∂V(i)log(1|1−λ1|)≥log(2|u(i)|∥u∥L∞).

We observe that is required for the result to be nontrivial. The inequality is asymptotically sharp: let us consider a complete graph with all weights being identically and one vertex chosen to be the absorbing state.

An easy computation shows that the constant vector satisfies and is the smallest integer with

 (1−1n)d∂V(i)≤12.

Since

 limn→∞(1−1n)n=1e,we get thatlimn→∞d∂V(i)n=log2.

Thus, for large,

 d∂V(i)log(11−λ)∼log(1+1n−1)nlog2∼log2.

The very same example can be used to show that the main result is sharp in the case, where the eigenfunction actually has a root . This can be achieved by taking two separate copies of complete graphs , adding a free vertex and connecting the free vertex to all other vertices; a simple computation shows that this reduces the main result to Theorem 2, which is sharp.

### 1.7. Related results.

This paper is inspired by similar results in the continuous analogue due to the second and third author  (earlier results in a similar spirit were given by Georgiev & Mukherjee  and the third author ). Bovier, Eckhoff, Gayrard & Klein  study closely related questions regarding metastable states in Markov chains with bottlenecks and their relation to the spectrum and distribution of exit times (see also [6, 7]). For other approaches towards understanding the success of spectral clustering we refer to Meila & Shi  and Ng, Jordan & Weiss . In this context, we especially emphasize the results of Gavish & Nadler , who study the relation between the exit times of diffusion and the normalized cut.

## 2. Proofs

### 2.1. L and random walks.

First, we note that the spectrum of satisfies:

 σ(L)⊂{z∈C:|1−z|≤1}.

This follows trivially from the Gershgorin circle theorem since,

 Lii=1−pii,and n∑j=1j≠iLij=1−pii.

We quickly describe the underlying connection between and random walks on , this connection will be a crucial tool for all subsequent proofs. Fix a vertex , let and as before let for , denote the random walk associated with a Markov chain on the graph. By definition of the Graph Laplacian

 E(u(xn+1)∣∣xn) =∑j∈VP(xn+1=j)u(j)=∑j∈Vpiju(j) =u(xn)+∑j∈Vpij(u(j)−u(xn)) =u(xn)−(Lu)(xn)

If , then we get

 E(u(xn+1)∣∣xn)=(1−λ)u(xn)

and, by induction,

 E(u(xn))=(1−λ)nu(x0).

### 2.2. Proof of Theorem 1

###### Proof.

Let us assume that solves , normalized to , let be arbitrary and assume w.l.o.g. (after possibly replacing by ) that . As before, we start random walks in whose transition probabilities are given by . We see that,

 E(u(xn))=(1−λ)nu(x0).

We fix the value and make a distinction between those random walks that never enter the set up to time (which we call event ) and those random walks that are contained in at some point (event ). We have

 |E(u(xn))|=|E(u(xn)∣∣A)|⋅P(A)+|E(u(xn)∣∣¬A)|⋅P(¬A).

Trivially,

 |E(u(xn)∣∣A)|P(A)≤P(A).

In the second case, the random walk entering the set at some point , we can employ the Markovian property and conclude that

 |E(u(xn)∣∣¬A)|P(¬A)≤ε|1−λ|n−kP(¬A)≤εP(¬A).

Altogether, this implies

 u(i)|1−λ|n =u(i)|1−λ|nu(x0)=E(u(xn)) ≤P(A)+εP(¬A).

It follows from the definition of diffusion time that

 P(¬A)≥12.

Thus

 u(i)|1−λ|dB(i)≤12+ε

from which the result follows. ∎

### 2.3. Proof of Theorem 2.

###### Proof.

Assume that the eigenfunction is normalized as and let such that and let . Then, we can analyze the expectation in

 (1−λ)nu(i)=E(u(xn))

by concluding that at least in half of all cases we get 0 (this follows from the definition of the diffusion distance) – in the other cases, we do not know what to expect but the contribution can certainly not be larger than since this is the maximal value; therefore

 |1−λ|nu(i)≤|(1−λ)nu(i)|=|E(u(xn))|≤12

and this implies the statement. ∎

## 3. The continuous case: hot spots

These results have a continuous equivalent that may be of independent interest and has some applications to the hot spots problem. Let be a convex set with smooth boundary and assume

 −Δu =λuin Ω ∂u∂n =0on ∂Ω,

where is assumed to be a nontrivial eigenvalue. Note that

 ∫Ωu(x) dx=−1λ∫ΩΔu(x) dx=−1λ∫∂Ω∂u∂ν dσ=0,

the nodal set is not the empty set

 N={x∈Ω:u(x)=0}≠∅.

We will now show that both maximum and minimum are both a nontrivial distance away from the nodal set. The technique is given by a variant of the argument used in .

###### Theorem 3.

Let be bounded with smooth boundary. Suppose assumes its global maximum or minimum at . Let denotes the expected time for half of all Brownian motions started in and reflected off the boundary to hit . Then

 dN(x)λ≥log2.

It is easy to see that this is the correct scaling: consider the eigenfunction on . The eigenvalue is , the extrema are a Euclidean distance away from the set . The appropriate scaling for a significant fraction of Brownian motion started in the extrema to hit is . Under additional assumptions on domain and eigenfunction, it is possible to recover information about the Euclidean distance; more precisely, for convex and being the first eigenvalue, it is possible to obtain a result along the lines of

 dN(x)∼dist(x,∂Ω)2

with implicit constants depending only on the dimension (see the proofs for details). Melas , proving a conjecture of Payne , has shown that the first nontrivial Laplacian eigenfunction with Neumann boundary conditions on a convex domain with boundary has a nodal line that intersects the boundary in exactly two points (splitting the domain). We cannot specify in advance where maximum or minimum is going to occur, the inequality allows us to specify a subregion in which they must lie (see Fig. 5). Figure 5. The maximum of the first eigenfunction v1 is indicated by the green cross, and those of the diffusion time by red circles, in both left and right sides. On both sides, the two maxima coincide. The set {i∈V:−dB(i)log(|1−λ1|)≥log(2)} is drawn in yellow.

Example. Consider the discrete pseudo-planar example shown in Figure 5. Here, nodes are randomly sampled within a domain and a graph is built by connecting 10 nearest neighbors of each node and symmetrizing the adjacency matrix. The first (nontrivial) eigenvector of the graph Laplacian achieves maximum and minimum on the two ends of the domain. On the right side, we obtain while our lower bound predicts that the diffusion time in the minimum is at least . On the left-hand side, we obtain and are guaranteed at least from our lower bound. It is remarkable that the correlation between the absolute value of the first eigenfunction and is as high as 0.9866: the diffusion time to the boundary (here: bottleneck) is a very good approximation of the first eigenfunction.

### 3.1. Proof of Theorem 3

###### Proof.

We will essentially repeat the argument from the proof of Theorem 2 in spirit. However, while it is certainly possible to approximate the continuous object by a graph and conclude Theorem 3 directly from Theorem 2, we wish to avoid certain technicalities involved with that and will give a fully continuous proof. Let with smooth boundary and assume satisfies

 −Δu =λuin Ω ∂u∂n =0on ∂Ω,

where is assumed to be a nontrivial eigenvalue. The mean value of the function is 0, which implies that the nodal set

 N={x∈Ω:u(x)=0}

is not empty. Since is assumed to be smooth, we can use the Feynman-Kac formula for the Neumann problem [13, 19] and write, for every ,

 u(x)=eλt Ex(u(ω(t))),

where is taken with respect to all Brownian motion started in , running until time and reflected off the boundary . Let us now assume that is the location of the maximum (a similar argument holds for the minimum). It is not difficult to see that the solution of the equation restricted to the connected component of (which we denote by ) also satisfies the equation

 −Δu =λuin Ω1 u =0in N∩∂Ω1 ∂u∂n =0on ∂Ω1∖N.

We can now use the probabilistic interpretation with respect to this new problem: will denote a Brownian motion started in and running for time that is absorded on and reflected on . We denote the probability of such a Brownian motion impacting on within time as . It is now easy to see that

 u(x)=eλt Ex(u(ω(t)))≤eλtu(x)pt(x).

If we set , then – by definition – and thus

 2≤eλ⋅dN(x),

which is the desired statement. ∎

The reason why this argument does not immediately translate into a statement for the Euclidean distance is that Brownian motion is reflected on . For complicated labyrinth-type domains, it is possible for the maximum to be assumed in a point that is very close to with respect to Euclidean distance but far away in terms of diffusion distance.

If is convex and we consider, say, the first (nontrivial) eigenfunction , then this scenario can be excluced: by considering , it becomes clear that we need to ensure that cannot be too small. A result of the third author  (refining a result of Dyer & Frieze ) states that for open and convex and all open subsets

 Hn−1(∂A∩Ω)≥4diam(Ω)|A||Ω∖A||Ω|,

where denotes the dimensional Hausdorff measure. An easy compactness argument shows that for the first eigenfunction for some implicit constant depending only on the dimension from which the equivalence of and (up to constants) follows.

## 4. Variations, remarks and comments

Since the argument itself is rather elementary, it is not surprising that there should be a series of natural variations. The purpose of this section is to outline some of them and remark on some additional interesting features.

### 4.1. Sharpness of the inequality.

We observe that the result is close to sharp in a variety of different settings (this includes both relatively sparse and relatively dense graphs).

Small world graphs. Consider a small world graph with 128 nodes on a ring. The absorbing boundary, is 8 randomly selected vertices. Furthermore, additional edges are generated between any two vertices in an i.i.d. way, such that the expected number of additional edges is 64. A typical realization is shown in Figure 7. We have while the inequality predicts at least . The first eigenfunction is a very good approximation of ; their correlation is 0.9862. Figure 7. The diffusion time to boundary d∂V (top) and the first eigenvector v1 (bottom) on a small world graph The scatter plot (bottom right) shows v1 and d∂V rescaled to have maximum 1, respectively.

Dense graphs. The second example, shown in Fig. 8, has the same setup but a larger number of connections. We find that while the inequality predicts at least . The correlation between the first eigenfunction and is 0.9897. Figure 8. The first eigenvector and diffusion time to boundary d∂V on another graph.

These examples motivate the following question.

Problem. To what extent does approximate the first eigenfunction?

The examples considered above show a very large correlation. It is not difficult to see that correlation by itself is not the right notion to use since one can construct examples in which the first eigenfunction localizes to a much stronger extent than , which is large in a variety of different places – however, in that case we would still expect to approximate the first eigenfunction very well on the domain where it is large (and, in particular, a large correlation provided we only compute the correlation on that domain). We observe that similar connections exists in the continuous setting (see [10, 18, 21]) but are not fully understood there either.

### 4.2. Families of extremizers.

As was discussed after stating Theorem 2, the complete graph with one vertex selected as boundary point shows that the constant cannot be improved (by letting ). The purpose of this section is to show that there is a much larger family of graphs for which this is the case.

###### Proposition.

Let be a sequence of graphs and let be a single vertex which is an absorbing state. Furthermore, assume that

 ∀ i∈Vn∖∂Vn: ∑i∼jj∈∂Vnpijis independent of i and tends to 0 as n→∞.

Then there exists an eigenfunction with eigenvalue with maximum at such that

 d∂V(in)log(11−λn)→log2.
###### Proof.

It is easy to see that under these assumptions the constant vector is an eigenvector. The eigenvalue is given as

 λ=∑i∼jj∈∂Vnpij−1,

where the value of is not important as the sum does, by assumption, not depend on that parameter. The diffusion time to the boundary is given as

 min⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩k∈N:⎛⎜ ⎜⎝1−∑i∼jj∈∂Vnpij⎞⎟ ⎟⎠k≤12⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎢−log2log(1−∑i∼jj∈∂Vnpij)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎥

from which the statement follows as long as the quantity tends to infinity, which requires

 ∑i∼jj∈∂Vnpij→0.

This makes it quite easy to construct infinite families of graphs for which equality is asymptotically attained. One possible example is given by the circle graph if we add an additional vertex, which we define to be the boundary, and impose transition probabilities to be

 P(interior point→boundary point)=ε

for some small; the transition probabilities within can be chosen in an arbitrary fashion (as long as they add up to ).

### 4.3. Schrödinger-type equations.

Akin to arguments in , we can extend our main result to more general equations of the type

 Lu=Wu,

where is assumed to be a diagonal matrix; the role of the ‘eigenvalue’ is then played by the supremum-norm of the diagonal

 ∥W∥L∞=max|wi|.

The setup coincides with the eigenvalue case, if is a multiple of the identity. We observe that under the assumption that the graph is connected and the absorbing set is nonempty , then the diffusion process induced by ultimately transports mass to the boundary and therefore, for every vector ,

 limk→∞Lkv=0.

Thus, the equation implies that .

###### Corollary.

Suppose with and . Then

 d∂V(i)log(11−∥W∥L∞)≥log(2|u(i)|∥u∥L∞).
###### Proof.

The proof is essentially identical to the Proof of Theorem 2 once we use

 E(u(xn+1)∣∣xn)=u(xn)−Lu(xn)≤u(xn)−∥W∥L∞u(xn).

There is also a natural variant of Theorem 1 that can be obtained via the same argument.

Acknowledgement. The authors are grateful to Raphy Coifman, Jianfeng Lu and Boaz Nadler for valuable discussions.

## References

•  D. Arnold, G. David, D. Jerison, S. Mayboroda, and M. Filoche. Effective confining potential of quantum states in disordered media. Physical Review Letters, 116 (5), 2016.
•  R. Banuelos and K. Burdzy, On the ‘hot spots’ conjecture of J. Rauch. J. Funct. Anal. 164 (1999), 1–33.
•  R. Banuelos and T. Carroll, Brownian motion and the fundamental frequency of a drum. Duke Math. J. 75 (1994), no. 3, 575–602.
•  R. Banuelos and T. Carroll, Addendum to: Brownian motion and the fundamental frequency of a drum, Duke Math. J. 82 (1996), 227.
•  A. Bovier, M. Eckhoff, V. Gayrard and M. Klein, Metastability and low lying spectra in reversible Markov chains. Comm. Math. Phys. 228 (2002), no. 2, 219–255.
•  A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, Metastability in reversible diffusion processes. I. Sharp asymptotics for capacities and exit times. J. Eur. Math. Soc. (JEMS) 6 (2004), no. 4, 399–424.
•  A. Bovier, V. Gayrard and M. Klein, Metastability in reversible diffusion processes. II. Precise asymptotics for small eigenvalues. J. Eur. Math. Soc. (JEMS) 7 (2005), no. 1, 69–99.
•  K. Burdzy and W. Werner, A counterexample to the “hot spots” conjecture. Ann. of Math. (2) 149 (1999), no. 1, 309–317.
•  M. Dyer, A. Frieze, Computing the volume of convex bodies: a case where randomness provably helps, Proc. Sympos. Appl. Math., 44, Amer. Math. Soc., Providence, RI, 1991.
•  M. Filoche and S. Mayboroda, Universal mechanism for Anderson and weak localization. Proc. Natl. Acad. Sci. USA 109 (2012), no. 37, 14761-14766.
•  M. Gavish and B. Nadler, Normalized cuts are approximately inverse exit times. SIAM J. Matrix Anal. Appl. 34 (2013), no. 2, 757-772.
•  B. Georgiev and M. Mukherjee, Nodal Geometry, Heat Diffusion and Brownian Motion, arXiv:1602.07110
•  P. Hsu, Probabilistic approach to the Neumann problem. Comm. Pure Appl. Math. 38 (1985), no. 4, 445–472.
•  M. Meila, J. Shi, A random walks view of spectral segmentation, AISTATS 2001
•  A. Melas, On the nodal line of the second eigenfunction of the Laplacian in . J. Differential Geom. 35 (1992), no. 1, 255–263.
• 

A. Ng, M. I. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems (NIPS) 14, 2002.

•  L. Payne, Isoperimetric inequalities and their applications. SIAM Rev. 9 1967 453–488.
•  M. Rachh and S. Steinerberger, On the location of Maxima of Solutions of Schrödinger’s equation, arXiv:1608.06604
• 

S. Steinerberger, Lower bounds on nodal sets of eigenfunctions via the heat flow. Comm. Partial Differential Equations 39 (2014), 2240–2261.

•  S. Steinerberger, Sharp -Poincaré inequalities correspond to optimal hypersurface cuts. Arch. Math. 105 (2015), no. 2, 179–188.
•  S. Steinerberger, Localization of Quantum States and Landscape Functions, Proc. Amer. Math. Soc., to appear