Community detection and percolation of information in a geometric setting

06/28/2020 ∙ by Ronen Eldan, et al. ∙ 0

We make the first steps towards generalizing the theory of stochastic block models, in the sparse regime, towards a model where the discrete community structure is replaced by an underlying geometry. We consider a geometric random graph over a homogeneous metric space where the probability of two vertices to be connected is an arbitrary function of the distance. We give sufficient conditions under which the locations can be recovered (up to an isomorphism of the space) in the sparse regime. Moreover, we define a geometric counterpart of the model of flow of information on trees, due to Mossel and Peres, in which one considers a branching random walk on a sphere and the goal is to recover the location of the root based on the locations of leaves. We give some sufficient conditions for percolation and for non-percolation of information in this model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Community detection in large networks is a central task in data science. It is often the case that one gets to observe a large network, the links of which depends on some unknown, underlying community structure. A natural task in this case is to detect and recover this community structure to the best possible accuracy.

Perhaps the most well-studied model in this topic is the stochastic block model where a random graph whose vertex set is composed of several communities, is generated in a way that every pair of nodes which belong to communities , will be connected to each other with probability , hence with probability that only depends on the respective communities, and otherwise independently. The task is to recover the communities based on the graph (and assuming that the function is known). The (unknown) association of nodes with communities is usually assumed to be random and independent between different nodes. See [1] for an extensive review of this model.

A natural extension of the stochastic block model is the geometric random graph, where the discrete set of communities is replaced by a metric space. More formally, given a metric space , a function from a vertex set to the metric space and a function , a graph is formed by connecting each pair of vertices independently, with probability

This model can sometimes mimic the behavior of real-world networks more accurately than the stochastic block model. For example, a user in a social network may be represented as a point in some linear space in a way that the coordinates correspond to attributes of her personality and her geographic location. The likelihood of two persons being associated with each other in the network will then depend on the proximity of several of these attributes. A flat community structure may therefore be two simplistic to reflect these underlying attributes.

Therefore, a natural extension of the theory of stochastic block models would be to understand under what conditions the geometric representation can be recovered by looking at the graph. Our focus is on the case that the metric is defined over a symmetric space, such as the Euclidean sphere in -dimensions. By symmetry, we mean that the probability of two vertices to be connected, given their locations, is invariant under a natural group acting on the space. We are interested in the sparse regime where the expected degrees of the vertices do not converge to infinity with the size of the graph. This is the (arguably) natural and most challenging regime for the stochastic block model.

1.1 Inference in geometric random graphs

For the sake of simplicity, in what follows, we will assume that the metric space is the Euclidean sphere, and our main theorems will be formulated in this setting; It will be straightforward to generalize our results to any symmetric space (see [5] for further discussion on this point).

In order to construct our model, we need some notation. Let be the uniform probability measure on and let , be of the form for some . Define the integral operator by

It is standard to show that is a self-adjoint compact operator (see [7], for example) and so has a discrete spectrum, except at . By definition, is invariant to rotations and so

commutes with the Laplacian. It follows that the eigenfunctions of

are precisely the spherical harmonics which we denote by . Thus, if

denotes the eigenvalue of

corresponding to we have the following identity,

(1)

In particular, and for , are linear functionals such that, for ,

(2)

Note that in our notation the eigenvalues are indexed by the spherical harmonics, and are therefore not necessarily in decreasing order. By rotational invariance it must hold that

(3)

Define . We make the following, arguably natural, assumptions on the function :

  1. [label=A0., ref=A0]

  2. There exist such that .

  3. Reordering the eigenvalues in decreasing order there exists such that for every .

Let

be a sequence of independently-sampled vectors, uniformly distributed on

. Let be the inhomogeneous Erdös-Rényi model where edges are formed independently with probability and let be the adjacency matrix of a random graph drawn from .

Definition 1.

We say that the model is if, for all large enough, there is an algorithm which returns an matrix such that

Remark that, due the symmetry of the model, it is clear that the locations can only be reconstructed up to an orthogonal transformation, which is equivalent to reconstruction of the Gram matrix.

Theorem 2.

For every there exists a constant , such that the model is -reconstructible whenever

(4)
Remark 3.

Observe that, since the left hand side of condition (4) is -homogeneous, whereas its right hand side is -homogeneous, we have that as long as the left hand side is nonzero, by multiplication of the function by a large enough constant, the condition can be made to hold true.

Example 1.

Consider the linear kernel, , with . A calculation shows that

Applying our theorem, we show that the model is reconstructible whenever

Methods and related works.

Our reconstruction theorem is based on a spectral method, via the following steps:

  1. We observe that by symmetry of our kernel, linear functions are among its eigenfunctions. We show that the kernel matrix (hence the matrix obtained by evaluating the kernel at pairs of the points

    ) will have a respective eigenvalues and eigenvectors which approximate the ones of the continuous kernel.

  2. Observing that the kernel matrix is the expectation of the adjacency matrix, we rely on a matrix concentration inequality due to Le-Levina-Vershynin [8] to show that the eigenvalues of the former are close to the ones of the latter.

  3. We use the Davis-Kahan theorem to show that the corresponding eigenvectors are also close to each other.

The idea in Steps 2 and 3 is not new, and rather standard (see [8] and references therein). Thus, the main technical contribution in proving our upper bound is in Step 1, where we prove a bound for the convergence of eigenvectors of kernel matrices. So far, similar results have only been obtained in the special case that the Kernel is positive-definite, see for instance [4].

The paper [14] considers kernels satisfying some Sobolev-type hypotheses similar to our assumptions on

(but gives results on the spectrum rather than the eigenvectors). Reconstruction of the eigenspace has been considered in

[12] for positive definite kernels in the dense regime, in [11] for random dot products graphs and in [2] in the dense and relatively sparse regimes again for kernels satisfying some Sobolev-type hypotheses.

1.2 Percolation of geometric information in trees

The above theorem gives an upper bound for the threshold for reconstruction. The question of finding respective lower bounds, in the stochastic block model, is usually reduced to a related but somewhat simpler model of percolation of information on trees. The idea is that in the sparse regime, the neighborhood of each node in the graph is usually a tree, and it can be shown that recovering the community of a specific node based on observation of the entire graph, is more difficult than the recovery of its location based on knowledge of the community association of the leaves of a tree rooted at this node. For a formal derivation of this reduction (in the case of the stochastic block model), we refer to [1].

This gives rise to the following model, first described in Mossel and Peres [10] (see also [9]): Consider a -ary tree of depth , rooted at . Suppose that each node in is associated with a label in the following way: The root is assigned with some label and then, iteratively, each node is assigned with its direct ancestor’s label with probability and with a uniformly picked label with probability (independent between the nodes at each level). The goal is then to detect the assignment of the root based on observation of the leaves.

Let us now suggest an extension of this model to the geometric setting. We fix a Markov kernel such that for all . We define in the following way. For the root , is picked according to the uniform measure. Iteratively, given that is already set for all nodes at the -th level, we pick the values for nodes at the th level independently, so that if is a direct descendant of , the label is distributed according to the law .

Denote by the set of nodes at depth , and define by the conditional distribution of given . We say that the model has positive information flow if

Remark that by symmetry, we have

where is the root and is the north pole.

Our second objective in this work is to make the first steps towards understanding under which conditions the model has positive information flow, and in particular, our focus is on providing nontrivial sufficient conditions on for the above limit to be equal to zero.

Let us first outline a natural sufficient condition for the information flow to be positive which, as we later show, turns out to be sharp in the case of Gaussian kernels. Consider the following simple observable,

By Bayes’ rule, we clearly have that the model has positive information flow if (but not only if)

(5)

This gives rise to the parameter

which is the eigenvalue corresponding to linear harmonics. By linearity of expectation, we have

For two nodes define by the deepest common ancestor of and by its level. A calculation gives

This gives a sufficient condition for (5) to hold true, concluding:

Claim 4.

The condition is sufficient for the model to have positive percolation of information.

We will refer to this as the Kesten-Stigum (KS) bound.

We now turn to describe our lower bounds. For the Gaussian kernel, we give a lower bound which misses by a factor of from giving a matching bound to the KS bound. To describe the Gaussian kernel, fix , let

be a normal random variable with law

and suppose that is such that

(6)

where we identify with the interval . We have the following result.

Theorem 5.

For the Gaussian kernel defined above, there is zero information flow whenever .

In the general case, we were unable to give a corresponding bound, nevertheless we are able to give some nontrivial sufficient condition for zero flow of information for some , formulated in terms of the eigenvalues of the kernel. In order to formulate our result, we need some definitions.

We begin with a slightly generalized notion of a -ary tree.

Definition 6.

Let , we say that is a tree of growth at most if for every ,

Now, recall that . Our bound is proven under the following assumptions on the kernel.

  • is monotone.

  • is continuous.

  • and for every , .

We obtain the following result.

Theorem 7.

Let satisfy the assumptions above and let be a tree of growth at most . There exists a universal constant , such that if

then the model has zero percolation of information.

2 The upper bound: Proof of Theorem 2

Recall that

with the eigenvalues indexed by the spherical harmonics. Define the random matrices by

Note that is an matrix, while , has infinitely many columns. Furthermore, denote by the diagonal matrix . Then

For we also denote

the finite rank approximation of , , and the sub-matrix of composed of its first columns. Finally, denote

As before, let be an adjacency matrix drawn from so that . Our goal is to recover from the observed . The first step is to recover from . We begin by showing that the columns of are, up to a small additive error, eigenvectors of . To this end, denote

, and .

Lemma 1.

Let be the ’th column of and let . Then

Moreover, whenever , we have with probability larger than ,

where only depends on and on the dimension.

Proof.

Let be the ’th standard unit vector so that . So,

We then have

To bound the error, we estimate

as

It remains to bound . Let stand for the ’th row of . Then, , is a sum of independent, centered random matrices. We have

Furthermore, the norm of the matrices can be bounded by

Note that the right hand side of the two last displays are of the form where depends only on and (not not on ). Applying matrix Bernstein ([13, Theorem 6.1]) then gives

where depends only on and . Choose now . As long as , , and the above bound may be refined to

With the above conditions, it may now be verified that , and the proof is complete.      

We now show that as increases, the eigenvectors of converge to those of . Order the eigenvalues in decreasing order and let . Note that it follows from assumption 2 that . We will denote by the respective eigenvalues of and , ordered in a decreasing way, and by their corresponding unit eigenvectors. Suppose that is such that

(7)

Moreover, define

The next lemma shows that is close to whenever both and are large enough.

Lemma 2.

For all , let be the orthogonal projection onto . Then, for all there exist constants such that for all and , we have with probability at least that, for all ,

where and are the constants from Assumption 1 and Assumption 2.

Proof.

We have

Applying Markov’s inequality gives that with probability

(8)

Theorem 1 in [14] shows that there exists large enough such that with probability larger than , one has

with being the constant from Assumption 1. It follows that

(9)

while by (8) and Weyl’s Perturbation Theorem (e.g., [3, Corollary III.2.6]), for large enough with probability ,

(10)

Combining (8), (9) and (10) it follows from the classical Davis-Kahan theorem (see e.g. [3, Section VII.3]) that with probability at-least , for every ,

     

Denote

A combination of the last two lemmas produces the following:

Theorem 8.

One has

in probability, as .

Proof.

Denote

Then

We will show that the two terms on the right hand side converge to zero. Let be a function converging to infinity slowly enough so that , for the constant defined in Lemma 1. Taking to converge to zero slowly enough and applying Lemma 1, gives for all ,

with the ’th column of and where as . Now, if we write

the last inequality becomes

Using Equation (10), we have

(11)

and thus

Define a -matrix by . Then we can rewrite the above as

Now, since for two matrices we have . It follows that

(12)

Observe that

implying that

where . Consequently we have

which implies that

Combining with (12) finally yields

in probability, as .
If is the orthogonal projection onto , and is the orthogonal projection onto , then Lemma 2 shows that for all , with probability at least , as (and ), we have for every unit vector

(13)

with some . By symmetry, we also have for every unit vector that

(this uses that fact that both and are projections into subspaces of the same dimension). The last two inequalities easily yield that . Since this is true for every , it follows that

in probability, as .      

Now, after establishing that is close to , the second step is to recover (and therefore ), from the observed . For the proof we will need the following instance of the Davis-Kahan theorem.

Theorem 9 ([15, Theorem 2]).

Let be symmetric matrices with eigenvalues resp. with corresponding orthonormal eigenvectors resp. . Let and . Then there exists an orthogonal matrix such that

Our main tool to pass from the expectation of the adjacency matrix to the matrix itself is the following result regarding concentration of random matrices, which follows from [8, Theorem 5.1].

Theorem 10.

Let be the adjacency matrix of a random graph drawn from . Consider any subset of at most vertices, and reduce the weights of the edges incident to those vertices in an arbitrary way but so that all degrees of the new (weighted) network become bounded by . Then with probability at least the adjacency matrix of the new weighted graph satisfies

We can now prove the main reconstruction theorem.

Proof of Theorem 2.

Let be the adjacency matrix of a random graph drawn from the model . We first claim that with probability tending to , there exists a re-weighted adjacency matrix as defined in Theorem 10. Indeed by Chernoff inequality for have for all ,

and therefore, by Markov’s inequality, the expectation of the number of vertices whose degree exceeds goes to zero with .

Denote by its eigenvalues and by the corresponding orthonormal eigenvectors of . Let . By Theorem 9 there exists an such that

Hence by Theorem 10 we have

with probability . It follows that