# On role extraction for digraphs via neighbourhood pattern similarity

We analyse the recovery of different roles in a network modelled by a directed graph, based on the so-called Neighbourhood Pattern Similarity approach. Our analysis uses results from random matrix theory to show that when assuming the graph is generated as a particular Stochastic Block Model with Bernoulli probability distributions for the different blocks, then the recovery is asymptotically correct when the graph has a sufficiently large dimension. Under these assumptions there is a sufficient gap between the dominant and dominated eigenvalues of the similarity matrix, which guarantees the asymptotic correct identification of the number of different roles. We also comment on the connections with the literature on Stochastic Block Models, including the case of probabilities of order log(n)/n where n is the graph size. We provide numerical experiments to assess the effectiveness of the method when applied to practical networks of finite size.

## Authors

• 6 publications
• 8 publications
• 10 publications
09/25/2015

### Information Limits for Recovering a Hidden Community

We study the problem of recovering a hidden community of cardinality K f...
05/22/2019

### Blind identification of stochastic block models from dynamical observations

We consider a blind identification problem in which we aim to recover a ...
11/06/2014

### A Generic Sample Splitting Approach for Refined Community Recovery in Stochastic Block Models

We propose and analyze a generic method for community recovery in stocha...
02/19/2019

### Asymptotic Theory of Eigenvectors for Large Random Matrices

Characterizing the exact asymptotic distributions of high-dimensional ei...
04/09/2018

### Cauchy noise loss for stochastic optimization of random matrix models via free deterministic equivalents

Based on free probability theory and stochastic optimization, we introdu...
08/14/2017

### Situation Recognition with Graph Neural Networks

We address the problem of recognizing situations in images. Given an ima...
04/01/2020

### Pattern graphs: a graphical approach to nonmonotone missing data

We introduce the concept of pattern graphs–directed acyclic graphs repre...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The analysis of large graphs typically assumes that there is an underlying structure in the graph that allows us to represent it in a simpler manner. A typical example of this is the detection of communities, which are groups of nodes that have most of their connections with other nodes of the same group, and few connections with nodes of other groups. Various measures and algorithms have been developed to identify these community structures and many applications have also been found for these model structures [11, 12, 20, 23]. But many graph structures cannot be modelled using communities: for example, arrowhead and tree graph structures, which appear in overlapping communities, human protein-protein interaction networks, and food and web networks [12, 18, 21, 22]. These more general types of network structures can be modelled as role structures, and the process of finding them is called the role extraction problem, or block modelling [4, 8, 16, 24, 25].

The role extraction problem is a generalization of the community detection problem and it determines a representation of a network by a smaller structured graph, where nodes are grouped together based upon their interactions with nodes in either the same group or in different groups called roles. If no a priori information is available, one needs to verify all possible group and role assignments in order to determine the best role structure for the data, which leads to an NP-hard problem [5, 8]. But if one assumes that the adjacency matrix of the graph is a sample of a random matrix that follows certain rules, then the problem of recovering the underlying block structure may become tractable. The stochastic block model is precisely such a model : the interactions between all nodes of a particular group with all nodes of another group follow the same distribution [9, 14]. The successful extraction of the correct roles in stochastic block models for undirected graphs has been shown in the literature [1, 16]. In the present paper, we show that this is also the case for directed graphs. For this, we make extensive use of the neighbourhood pattern similarity

matrix, which is real symmetric also for directed graphs, and therefore has real eigenvalues and eigenvectors. We show that for sufficiently large graphs, the gap between dominant and dominated eigenvalues allows a convergent recovery of which nodes are associated with the different roles in the model.

In Section 2 we go over several preliminaries related to graphs, random matrices, stochastic block models and role modelling. Section 3 then yields the spectral bounds for the pattern similarity matrix associated with the graph being analysed and in Section 4 we make the connection with the Stochastic Block Modelling problem. In Section 5 we give a few numerical experiments illustrating our theoretical analysis, and we terminate with a few concluding remarks in Section 6.

## 2 Preliminaries

### 2.1 Graph theory and role extraction

An unweighted directed graph, or digraph,

is an ordered pair of sets where the elements of

are called vertices, or nodes, while the elements of are called edges, or links. A walk of length on the digraph from to is a sequence of vertices of the form such that for all , . is said to be strongly connected if for all there is a path on from to .

The adjacency matrix of an unweighted digraph is defined as

 A∈Rn×n,Aij={1if (i,j)∈E;0otherwise.

In particular, is a non-negative matrix and . It is well known that is irreducible if and only if is strongly connected; in that case, the Perron-Frobenius spectral theory for irreducible non-negative matrices applies. Manifestly, there is a bijection between adjacency matrices and digraphs. Moreover, two digraphs are isomorphic (i.e. they coincide up to a relabelling of the vertices) if and only if their adjacency matrices are permutation similar.

Given a graph with adjacency matrix , the problem of role extraction consists on finding a positive integer and an assignment function such that can be well approximated by an ideal adjacency matrix such that only depends on and . Equivalently, if is any permutation that reorders the nodes such that nodes of the same group are adjacent to each other, and is the corresponding permutation matrix, then is approximately constant in the blocks induced by the assignment. One can then associate with it a so-called ideal adjacency matrix as illustrated in Figure 1 : for the blocks of where 1 dominates, put all elements of the corresponding block in equal to 1 and for the blocks where 0 dominates, put them all equal to 0 in . In doing so, the nodes in each block of are regularly equivalent [10], i.e. they have the same parents and the same children [19]. The above approximation problem for can thus be viewed as finding a nearby regularly equivalent graph to a given graph.

In the rest of the paper, we will assume that the permutation is the identity, in order to simplify the notation. This does not affect the generality of our results.

### 2.2 Random graphs and random matrices

Let

be a probability triple and consider the space of random variables

. A random digraph , , is a graph whose adjacency matrix is one such random variable. We denote the expectation of a random matrix by . We construct a random unweighted digraph as follows:

1. The nodes are partitioned in clusters of nodes, , of size , , respectively.

2. There is an edge between a node in cluster to a node in cluster with probability depending only on and .

The adjacency matrix of such a random graph is an random matrix, . Suppose : then, is distributed as a Bernoulli variable centered on with . Hence, denoting by

the vector of all ones and by

, then

 Mn:=E[An]=M1⊗1n1Tn

has rank (for all values of ). Note that is a deterministic matrix with precisely

nonzeros singular values: since the singular values of

are equal to for all possible choices of we have in particular that

 σi(Mn)=nσi(M1)i=1,…,s,andσi(Mn)=0∀ i>s.

An explicit characterization of is also possible: setting

 Z=q⨁i=11mi∈Rm×q,C∈Rq×q,Cab=pab ∀ 1≤a,b≤q,

then it is clear that

Furthermore, where is a random matrix with mean

and the variance of

is , where . The literature is rich in results concerning these kinds of matrices. Here we report one we will need in our arguments, that is in line with the asymptotic Marchenko-Pastur law for i.i.d. matrices.

###### Theorem 2.1

[2, Remark 5.19] Let be random matrices with independent, mean zero, and uniformly bounded entries. Suppose that

bounds the second moments of all entries, independently on

. In this case,

 limsupn→∞∥∥∥1√nYn∥∥∥≤2σ

almost surely.

To have a non-asymptotic bound, one can turn to [26, Corollary 2.3.5], where under the same hypotheses, it has been proved that there exist such that

 P[∥Yn∥2>t√n]≤c1exp(−c2tn)

for all .

We remark that the adjacency matrix of the so-called ideal graph corresponding to the random graph of this model was defined in [19] as

 (Aid)ij={1 if (E[A])ij≥12;0 if (E[A])ij<12.

### 2.3 Role extraction via the similarity matrix S

In [6, 19], it was proposed to solve the problem of role extraction for a digraph with adjacency matrix by means of a similarity matrix , which is defined as the limit of the sequence with

 S0=0,Sk+1=ΓA[I+β2Sk]=S1+β2ΓA(Sk),(k≥0) (1)

where the operator (depending on the matrix parameter ) is defined as

 ΓW[X]=WXWT+WTXW. (2)

It was shown in [19] that the sequence is convergent if and only if , and that the limit satisfies or, equivalently,

 vecS=(I−β2A⊗AT−β2AT⊗A)−1vec(AAT+ATA).

It was also shown there that element of the matrix is the weighted sum of the walks of length up to between nodes and , and that this can exploited to find nodes that should be associated with the same role.

Throughout this document, the parameter in (1) is always chosen such that is strictly smaller than 1, which is sufficient for the sequence to converge. Here and thereafter, we measure the norm of the operator induced by the spectral norm of its matrix argument. More concretely,

 ∥ΓW∥=supX≠0∥ΓW[X]∥∥X∥. (3)

Below we prove that , so that it is easy to compute a good enough with very low computational effort. In fact, we can always choose, for example, and obtain that necessarily .

###### Lemma 2.2

Let be nonnegative. Then, the norm (3) of the linear matrix mapping satisfies

 ∥ΓW∥=∥[WWT]∥2≤2∥W∥2.

Proof. For the upper bound we have that for all

 ∥ΓW[X]∥=∥∥∥[WWT][X00X][WTW]∥∥∥≤∥[WWT]∥2∥X∥

and

We then point out that the lower bound is satisfied for since

 ∥ΓW[I]∥/∥I∥=∥WWT+WTW∥=∥[WWT]∥2

and hence the lower bound is certainly satisfied for the maximizing as well.

## 3 Spectral bounds for the similarity matrix S

In this section we are analysing the spectral properties of the similarity matrix and its recurrence (1). We will first study the properties of the singular values for the matrix , and then move on to and

, estimating for each the gap between their dominant singular values and the noise.

### 3.1 Bounds for An

We have the following corollary of Theorem 2.1.

###### Proposition 3.1

Let be defined as in Section 2.2, and let . Then there is a constant such that almost surely for big enough,

 σi(An){≥nσi(M1)−2σ√n,1≤i≤s,≤2σ√n,i≥s+1.

In particular, the gap

 σs(An)σs+1(An)≥√nσi(M1)2σ−1=Ω(√n)

holds almost surely for big enough.

Proof. Since every entry of is an independent centered Bernoulli random variable, is uniformly bounded and has zero mean, and every entry has second moment bounded by . Hence, Theorem 2.1 applies and almost surely for big enough. Moreover, by Weyl’s theorem it holds

 σi(Mn)−∥Yn∥2≤σi(An)≤σi(Mn)+∥Yn∥2

and since , we have if and if .

Observe that a result similar to Proposition 3.1 holds for the singular values of or, equivalently, for the eigenvalues of . In fact, since , we can write

 σi(Rn){≥nσi([M1MT1])−2√2σ√n,1≤i≤r,≤√2∥Yn∥2≤2√2σ√n,i≥r+1, (4)

where , and

 σr(Rn)σr+1(Rn)≥√nσr([M1MT1])2√2σ−1=Ω(√n).

The constant in (4) is not sharp. In fact, both Theorem 4.1 in [13] and the experiments we will present suggest that the same holds with the constant

, following the classical bound on the Marchenko-Pastur distribution. Since there is no such result in literature, we formulate it here as a conjecture.

###### Conjecture 3.2

Let be random matrices with independent, mean zero, and uniformly bounded entries. Suppose that bounds the second moments of all entries, independently on . If we call , then

 limsupn→∞∥∥∥1√2nXn∥∥∥≤(1+√12)σ

almost surely.

### 3.2 Upper bound on the Error

In the previous section, we have computed some bounds for the singular values of and . Now we want to do the same for the matrices and coming from the recurrence relation (1) applied to . From now on, we drop for the sake of notational simplicity the suffices emphasizing the dependence on the size , so for example we simply write for .

###### Theorem 3.3

Let be defined as in Section 2.2 and . Call moreover and where , and

 S0=0,Sk+1=ΓA[I+β2Sk]=S1+β2ΓA(Sk),(k≥0), (5)

where the operator is defined as

 ΓW[X]=WXWT+WTXW, (6)

and is a chosen so that is a constant. In this case, the limit exists and

1. is asymptotically in bounded almost surely by .

2. is asymptotically in bounded almost surely by .

Proof.

1. By the recurrence (5) and Lemma 2.2, we know that

 ∥Sk+1∥≤∥R∥2(1+β2∥Sk∥)

so the sequence of converge whenever , and in this case

 1+β2∥Sk∥≤1+γ(1+γ+γ2+⋯+γk−2+γk−1)=1−γk+11−γ
 ⟹Sk=R(I2⊗(I+β2Sk−1))RT⪯1−γk1−γRRT.

From (4), we thus conclude that

 λr+1(Sk)≤1−γk1−γσr+1(R)2≤1−γk1−γ8nσ2.
2. From the last result, it is enough to let to obtain

 λr+1(S)≤11−γ8nσ2.

### 3.3 Lower Bound on the signal

If we write with as in the notation of Subsection 2.2 and 2.3, then (4) guarantees that for big enough,

 σr+1(R)≤δ:=∥[YYT]∥≤2σ√2n≪nσr([M1MT1])−2σ√2n≤σr(R). (7)

In other words, the matrix has dominant singular values (which we refer to as the “signal”) and small singular values (which we refer to as the “noise”). The gaps and between signal and noise, are expected to be large enough to allow for a correct assignment of the different nodes in each “role”. This separation becomes more pronounced when the dimensions of the matrix and its subgroups increase.

###### Remark 3.4

The singular values of the matrix can be obtained as follows. Define the diagonal matrix , then the decomposition of is given by where and the nonzero singular values of are those of (See Subsection 2.2 for the definitions of , and the integers ). Generally, and therefore the singular values of are times those of .

We now consider the recurrence relation using the expected value rather than since this yields a good approximation for the matrices. We denote these matrices as and their recurrence is thus given by

 T1=ΓM(In),Tk+1=ΓM[In+β2Tk],k=1,… (8)

Note that in (8) the matrix parameter in the operator is set to , in contrast with (1) where it was set to . Again, the parameter is chosen such that , which is required for the sequence to converge to . It was pointed out in [19] that the matrices and are all positive semi-definite, and that both sequences are ordered in the Loewner ordering :

 0=S0⪯S1⪯S2⪯…⪯S,0=T0⪯T1⪯T2⪯…⪯T.

Moreover, the matrices all have the same rank, image and null space. Let be the rank of the matrix and let be its nonzero singular values. We assume that is sufficiently large so that is small with respect to and the dominant singular values of are well separated from the other ones. Sufficient conditions for this were described in (4).

We now analyse the proximity between and more accurately.

###### Lemma 3.5

In the norm defined as in (3) it holds

 ∥ΓA−ΓM∥≤2δ√∥T1∥2+δ2.

Proof. For any matrix , if we rewrite

 +[YYT][X00X][YTY]

Then the desired bound follows from and .

###### Theorem 3.6

For it holds

 ∥Sk+1−Tk+1∥2 ≤ (2δ√∥T1∥2+δ2)(k∑i=0∥β2ΓA∥i)(k∑i=0∥β2ΓM∥i) ≤ 2δ√∥T1∥2+δ2(1−β2∥ΓA∥)(1−β2∥ΓM∥).

Proof. Denoting the increments and we obtain

 Sk+1−Tk+1=k+1∑i=1(ΔSi−ΔTi),S0=T0=0.

Observing that

 ΔSk+1=β2ΓA[ΔSk],ΔS1=ΓA[In],ΔTk+1=β2ΓM[ΔTk],ΔT1=ΓM[In],

and

 ΔSk+1−ΔTk+1=β2ΓA[ΔSk−ΔTk]+β2ΓA[ΔTk]−β2ΓM[ΔTk]

we can estimate

 [∥ΔSk+1−ΔTk+1∥2∥ΔTk+1∥2]≤N[∥ΔSk−ΔTk∥2∥ΔTk∥2],N:=β2[∥ΓA∥∥ΓA−ΓM∥0∥ΓM∥]

with initial conditions and . Hence, by induction for , it is not difficult to obtain the upper bound

 [∥ΔSk+1−ΔTk+1∥2∥ΔTk+1∥2]≤Nk[∥ΓA−ΓM∥∥ΓM∥]≤β2k[∥ΓA−ΓM∥∑ki=0∥ΓA∥i∥ΓM∥k−i∥ΓM∥k+1],

which, using also Lemma 3.5, finally yields the bound

 ∥Sk+1−Tk+1∥2≤(2δ√∥T1∥2+δ2)(k∑i=0∥β2ΓA∥i2)(k∑i=0∥β2ΓM∥i2).

Since and are both smaller than 1, we can let go to , to obtain the less conservative bound that also holds for every

 ∥S−T∥2≤2δ√∥T1∥2+δ2(1−β2∥ΓA∥2)(1−β2∥ΓM∥2).

It follows that in order to estimate the eigenvalues of and , it is enough to study and . For this reason, we first have to change the basis into a more convenient one. Since the rank of is , there exists an orthogonal similarity transformation such that

 UTAU=UT(M+Y)U=[M11000]+UTYU, (9)

where . (To see this, consider an SVD . Then, the bottom rows of both and are zero. By transposing the latter matrix, one can see that also the rightmost columns of are zero.) The orthonormal change of basis prescribed by yields the coordinate system where the matrices are compressed to their leading principal submatrix. In order to simplify the notation, we will from now on assume that the linear maps associated with are represented in that coordinate system, i.e., we replace by . Note that the eigenvalues are unchanged by this operation as it is a similarity transformation.

###### Lemma 3.7

It holds where . Moreover,

Proof. Under the orthogonal change of basis described above, the matrices can be written as

 Tk:=[^Tk000],

where is and has the nonzero eigenvalues of . Therefore each matrix is lower bounded in the Loewner ordering by . It then follows from that and

 ^Tk+1⪰λr(^T1)Ir+β2ΓM11(λr(^Tk)Ir)⪰(λr(^T1)+β2λr(^T1)2λr(^Tk))Ir,

which implies that

 λr(^Tk+1)≥λr(^T1)+β2λr(^T1)2λr(^Tk).

Since , the result follows by induction.

### 3.4 Bounds on the gaps

Now that we have all the bounds needed, we are ready to estimate the gaps for and . In fact, the matrices have a clear gap between their large and small eigenvalues.

• For the absolute gap, using Theorem 3.3,

 λr(Sk+1)−λr+1(Sk+1)≥λr(Tk+1)−∥Sk+1−Tk+1∥2−1−(β2∥R∥2)k1−β2∥R∥2δ2

and applying also Theorem 3.6, Lemma 3.7, and recalling from (7) that ,

 λr(Sk+1)−λr+1(Sk+1) ≥λr(T1)−2δ√∥T1∥2+δ2(1−β2∥ΓA∥)(1−β2∥ΓM∥)−1−(β2∥R∥2)k1−β2∥R∥2δ2 ≥σr([M1MT1])2n2−4√2σ∥[M1MT1]∥(1−β2∥ΓA∥)(1−β2∥ΓM∥)n√n−O(n).

As a consequence, we have a clear absolute gap of order whenever

 ∥[M1MT1]∥σr([M1MT1])24√2σ(1−β2∥ΓA∥)(1−β2∥ΓM∥)≪√n.

The gap is greater than all the following absolute gaps with , since they are bounded by a same parameter .

• For the relative gap, using Theorem 3.3,

 λr(Sk+1)λr+1(Sk+1)≥λr(Tk+1)−∥Sk+1−Tk+1∥21−(β2∥R∥2)k1−β2∥R∥2δ2

and applying again Theorem 3.6 and Lemma 3.7,

 λr(Sk+1)λr+1(Sk+1)≥λr(T1)−2δ√∥T1∥2+δ2(1−β2∥ΓA∥)(1−β2∥ΓM∥)1−(β2∥R∥2)k1−β2∥R∥2δ2 ≥1−β2∥R∥21−(β2∥R∥2)k(σr([M1MT1])28σ2n−∥[M1MT1]∥√2σ(1−β2∥ΓA∥)(1−β2∥ΓM∥)√n)−O(1).

As a consequence, we obtain a clear relative gap of order whenever

 4√2σ2∥[M1MT1]∥σ(1−β2∥ΓA∥)(1−β2∥ΓM∥)σr([M1MT1])2≪√n.

The gap is greater than all the preceding relative gaps with , since they are bounded by a same constant

 λi(Sk+1)λi+1(Sk+1)≤∥Sk+1∥λr(Sk+1)≲11−β2∥R∥2∥[M1MT1]∥2σr([M1MT1])2.

Notice that the same conclusions hold for the gaps of the matrix . We can notice that all the estimates get worse as gets close to zero. In fact, for example, in the case where all the probabilities are close to each other, it is harder to distinguish between different groups and clusters.

### 3.5 On the invariant subspaces

In this subsection, we study the dominant subspace of a real symmetric matrix (i.e. the invariant subspace associated with the largest eigenvalues) and argue why, for sufficiently large , it allows role extraction. Classically, distances between subspaces are measured via the concept of principal angles [17]: a multidimensional generalization of the acute angle between the unit vectors , i.e., . More generally, if are subspaces whose orthonormal bases are given, respectively, as the columns of the matrices , then the largest singular values of are the sines of the principal angles between (orthogonal complement of ) and . Just as in the one-dimensional case, the principal angles between two subspaces are all zero if and only if the subspaces coincide, and more generally the smaller the principal angles the closer the subspaces.

To set up notation, let us once again represent linear maps in the orthonormal basis where

Here, and are orthogonal bases for the dominant subspace (associated with signal) and the dominated subspace (associated with noise) of . By classical results in geometry and linear algebra [7, 15], the largest singular values of the matrix

 sinΘ:=[0In−r]Qs

are the sines of the principal angles between the dominant subspaces of and that of . Hence, the spectral norm of measures how well the dominant subspace of the similarity matrix approximates the one of the ideal graph.

We rely on Davis-Kahan’s sine theta theorem [7], in the form given by [15, Theorem 5.3]. Since the -th eigenvalue of is larger than the -th eigenvalue of (which is ), the assumptions of [15, Theorem 5.3] apply and thus

 ∥sinΘ∥2≤∥T−S∥2minλ∈Λsλ=∥T−S∥2λr(S).

Using the results of the previous section, in turn this yields for sufficiently large

 ∥sinΘ∥2≤2cδ√∥T1∥2λr(T)−2cδ√∥T1∥2≤c1n−1/2

for some constants . Therefore, we can state

###### Corollary 3.8

Asymptotically as , the principal angles between the dominant subspaces of