1 Introduction
The analysis of large graphs typically assumes that there is an underlying structure in the graph that allows us to represent it in a simpler manner. A typical example of this is the detection of communities, which are groups of nodes that have most of their connections with other nodes of the same group, and few connections with nodes of other groups. Various measures and algorithms have been developed to identify these community structures and many applications have also been found for these model structures [11, 12, 20, 23]. But many graph structures cannot be modelled using communities: for example, arrowhead and tree graph structures, which appear in overlapping communities, human proteinprotein interaction networks, and food and web networks [12, 18, 21, 22]. These more general types of network structures can be modelled as role structures, and the process of finding them is called the role extraction problem, or block modelling [4, 8, 16, 24, 25].
The role extraction problem is a generalization of the community detection problem and it determines a representation of a network by a smaller structured graph, where nodes are grouped together based upon their interactions with nodes in either the same group or in different groups called roles. If no a priori information is available, one needs to verify all possible group and role assignments in order to determine the best role structure for the data, which leads to an NPhard problem [5, 8]. But if one assumes that the adjacency matrix of the graph is a sample of a random matrix that follows certain rules, then the problem of recovering the underlying block structure may become tractable. The stochastic block model is precisely such a model : the interactions between all nodes of a particular group with all nodes of another group follow the same distribution [9, 14]. The successful extraction of the correct roles in stochastic block models for undirected graphs has been shown in the literature [1, 16]. In the present paper, we show that this is also the case for directed graphs. For this, we make extensive use of the neighbourhood pattern similarity
matrix, which is real symmetric also for directed graphs, and therefore has real eigenvalues and eigenvectors. We show that for sufficiently large graphs, the gap between dominant and dominated eigenvalues allows a convergent recovery of which nodes are associated with the different roles in the model.
In Section 2 we go over several preliminaries related to graphs, random matrices, stochastic block models and role modelling. Section 3 then yields the spectral bounds for the pattern similarity matrix associated with the graph being analysed and in Section 4 we make the connection with the Stochastic Block Modelling problem. In Section 5 we give a few numerical experiments illustrating our theoretical analysis, and we terminate with a few concluding remarks in Section 6.
2 Preliminaries
2.1 Graph theory and role extraction
An unweighted directed graph, or digraph,
is an ordered pair of sets where the elements of
are called vertices, or nodes, while the elements of are called edges, or links. A walk of length on the digraph from to is a sequence of vertices of the form such that for all , . is said to be strongly connected if for all there is a path on from to .The adjacency matrix of an unweighted digraph is defined as
In particular, is a nonnegative matrix and . It is well known that is irreducible if and only if is strongly connected; in that case, the PerronFrobenius spectral theory for irreducible nonnegative matrices applies. Manifestly, there is a bijection between adjacency matrices and digraphs. Moreover, two digraphs are isomorphic (i.e. they coincide up to a relabelling of the vertices) if and only if their adjacency matrices are permutation similar.
Given a graph with adjacency matrix , the problem of role extraction consists on finding a positive integer and an assignment function such that can be well approximated by an ideal adjacency matrix such that only depends on and . Equivalently, if is any permutation that reorders the nodes such that nodes of the same group are adjacent to each other, and is the corresponding permutation matrix, then is approximately constant in the blocks induced by the assignment. One can then associate with it a socalled ideal adjacency matrix as illustrated in Figure 1 : for the blocks of where 1 dominates, put all elements of the corresponding block in equal to 1 and for the blocks where 0 dominates, put them all equal to 0 in . In doing so, the nodes in each block of are regularly equivalent [10], i.e. they have the same parents and the same children [19]. The above approximation problem for can thus be viewed as finding a nearby regularly equivalent graph to a given graph.
In the rest of the paper, we will assume that the permutation is the identity, in order to simplify the notation. This does not affect the generality of our results.
2.2 Random graphs and random matrices
Let
be a probability triple and consider the space of random variables
. A random digraph , , is a graph whose adjacency matrix is one such random variable. We denote the expectation of a random matrix by . We construct a random unweighted digraph as follows:
The nodes are partitioned in clusters of nodes, , of size , , respectively.

There is an edge between a node in cluster to a node in cluster with probability depending only on and .
The adjacency matrix of such a random graph is an random matrix, . Suppose : then, is distributed as a Bernoulli variable centered on with . Hence, denoting by
the vector of all ones and by
, thenhas rank (for all values of ). Note that is a deterministic matrix with precisely
nonzeros singular values: since the singular values of
are equal to for all possible choices of we have in particular thatAn explicit characterization of is also possible: setting
then it is clear that
Furthermore, where is a random matrix with mean
and the variance of
is , where . The literature is rich in results concerning these kinds of matrices. Here we report one we will need in our arguments, that is in line with the asymptotic MarchenkoPastur law for i.i.d. matrices.Theorem 2.1
To have a nonasymptotic bound, one can turn to [26, Corollary 2.3.5], where under the same hypotheses, it has been proved that there exist such that
for all .
We remark that the adjacency matrix of the socalled ideal graph corresponding to the random graph of this model was defined in [19] as
2.3 Role extraction via the similarity matrix
In [6, 19], it was proposed to solve the problem of role extraction for a digraph with adjacency matrix by means of a similarity matrix , which is defined as the limit of the sequence with
(1) 
where the operator (depending on the matrix parameter ) is defined as
(2) 
It was shown in [19] that the sequence is convergent if and only if , and that the limit satisfies or, equivalently,
It was also shown there that element of the matrix is the weighted sum of the walks of length up to between nodes and , and that this can exploited to find nodes that should be associated with the same role.
Throughout this document, the parameter in (1) is always chosen such that is strictly smaller than 1, which is sufficient for the sequence to converge. Here and thereafter, we measure the norm of the operator induced by the spectral norm of its matrix argument. More concretely,
(3) 
Below we prove that , so that it is easy to compute a good enough with very low computational effort. In fact, we can always choose, for example, and obtain that necessarily .
Lemma 2.2
Let be nonnegative. Then, the norm (3) of the linear matrix mapping satisfies
Proof. For the upper bound we have that for all
and
We then point out that the lower bound is satisfied for since
and hence the lower bound is certainly satisfied for the maximizing as well.
3 Spectral bounds for the similarity matrix
In this section we are analysing the spectral properties of the similarity matrix and its recurrence (1). We will first study the properties of the singular values for the matrix , and then move on to and
, estimating for each the gap between their dominant singular values and the noise.
3.1 Bounds for
We have the following corollary of Theorem 2.1.
Proposition 3.1
Let be defined as in Section 2.2, and let . Then there is a constant such that almost surely for big enough,
In particular, the gap
holds almost surely for big enough.
Proof. Since every entry of is an independent centered Bernoulli random variable, is uniformly bounded and has zero mean, and every entry has second moment bounded by . Hence, Theorem 2.1 applies and almost surely for big enough. Moreover, by Weyl’s theorem it holds
and since , we have if and if .
Observe that a result similar to Proposition 3.1 holds for the singular values of or, equivalently, for the eigenvalues of . In fact, since , we can write
(4) 
where , and
The constant in (4) is not sharp. In fact, both Theorem 4.1 in [13] and the experiments we will present suggest that the same holds with the constant
, following the classical bound on the MarchenkoPastur distribution. Since there is no such result in literature, we formulate it here as a conjecture.
Conjecture 3.2
Let be random matrices with independent, mean zero, and uniformly bounded entries. Suppose that bounds the second moments of all entries, independently on . If we call , then
almost surely.
3.2 Upper bound on the Error
In the previous section, we have computed some bounds for the singular values of and . Now we want to do the same for the matrices and coming from the recurrence relation (1) applied to . From now on, we drop for the sake of notational simplicity the suffices emphasizing the dependence on the size , so for example we simply write for .
Theorem 3.3
Let be defined as in Section 2.2 and . Call moreover and where , and
(5) 
where the operator is defined as
(6) 
and is a chosen so that is a constant. In this case, the limit exists and

is asymptotically in bounded almost surely by .

is asymptotically in bounded almost surely by .
3.3 Lower Bound on the signal
If we write with as in the notation of Subsection 2.2 and 2.3, then (4) guarantees that for big enough,
(7) 
In other words, the matrix has dominant singular values (which we refer to as the “signal”) and small singular values (which we refer to as the “noise”). The gaps and between signal and noise, are expected to be large enough to allow for a correct assignment of the different nodes in each “role”. This separation becomes more pronounced when the dimensions of the matrix and its subgroups increase.
Remark 3.4
The singular values of the matrix can be obtained as follows. Define the diagonal matrix , then the decomposition of is given by where and the nonzero singular values of are those of (See Subsection 2.2 for the definitions of , and the integers ). Generally, and therefore the singular values of are times those of .
We now consider the recurrence relation using the expected value rather than since this yields a good approximation for the matrices. We denote these matrices as and their recurrence is thus given by
(8) 
Note that in (8) the matrix parameter in the operator is set to , in contrast with (1) where it was set to . Again, the parameter is chosen such that , which is required for the sequence to converge to . It was pointed out in [19] that the matrices and are all positive semidefinite, and that both sequences are ordered in the Loewner ordering :
Moreover, the matrices all have the same rank, image and null space. Let be the rank of the matrix and let be its nonzero singular values. We assume that is sufficiently large so that is small with respect to and the dominant singular values of are well separated from the other ones. Sufficient conditions for this were described in (4).
We now analyse the proximity between and more accurately.
Lemma 3.5
In the norm defined as in (3) it holds
Proof. For any matrix , if we rewrite
we readily see that
Then the desired bound follows from and .
Theorem 3.6
For it holds
Proof. Denoting the increments and we obtain
Observing that
and
we can estimate
with initial conditions and . Hence, by induction for , it is not difficult to obtain the upper bound
which, using also Lemma 3.5, finally yields the bound
Since and are both smaller than 1, we can let go to , to obtain the less conservative bound that also holds for every
It follows that in order to estimate the eigenvalues of and , it is enough to study and . For this reason, we first have to change the basis into a more convenient one. Since the rank of is , there exists an orthogonal similarity transformation such that
(9) 
where . (To see this, consider an SVD . Then, the bottom rows of both and are zero. By transposing the latter matrix, one can see that also the rightmost columns of are zero.) The orthonormal change of basis prescribed by yields the coordinate system where the matrices are compressed to their leading principal submatrix. In order to simplify the notation, we will from now on assume that the linear maps associated with are represented in that coordinate system, i.e., we replace by . Note that the eigenvalues are unchanged by this operation as it is a similarity transformation.
Lemma 3.7
It holds where . Moreover,
Proof. Under the orthogonal change of basis described above, the matrices can be written as
where is and has the nonzero eigenvalues of . Therefore each matrix is lower bounded in the Loewner ordering by . It then follows from that and
which implies that
Since , the result follows by induction.
3.4 Bounds on the gaps
Now that we have all the bounds needed, we are ready to estimate the gaps for and . In fact, the matrices have a clear gap between their large and small eigenvalues.
Notice that the same conclusions hold for the gaps of the matrix . We can notice that all the estimates get worse as gets close to zero. In fact, for example, in the case where all the probabilities are close to each other, it is harder to distinguish between different groups and clusters.
3.5 On the invariant subspaces
In this subsection, we study the dominant subspace of a real symmetric matrix (i.e. the invariant subspace associated with the largest eigenvalues) and argue why, for sufficiently large , it allows role extraction. Classically, distances between subspaces are measured via the concept of principal angles [17]: a multidimensional generalization of the acute angle between the unit vectors , i.e., . More generally, if are subspaces whose orthonormal bases are given, respectively, as the columns of the matrices , then the largest singular values of are the sines of the principal angles between (orthogonal complement of ) and . Just as in the onedimensional case, the principal angles between two subspaces are all zero if and only if the subspaces coincide, and more generally the smaller the principal angles the closer the subspaces.
To set up notation, let us once again represent linear maps in the orthonormal basis where
Here, and are orthogonal bases for the dominant subspace (associated with signal) and the dominated subspace (associated with noise) of . By classical results in geometry and linear algebra [7, 15], the largest singular values of the matrix
are the sines of the principal angles between the dominant subspaces of and that of . Hence, the spectral norm of measures how well the dominant subspace of the similarity matrix approximates the one of the ideal graph.
We rely on DavisKahan’s sine theta theorem [7], in the form given by [15, Theorem 5.3]. Since the th eigenvalue of is larger than the th eigenvalue of (which is ), the assumptions of [15, Theorem 5.3] apply and thus
Using the results of the previous section, in turn this yields for sufficiently large
for some constants . Therefore, we can state
Corollary 3.8
Asymptotically as , the principal angles between the dominant subspaces of
Comments
There are no comments yet.