# Randomized spectral co-clustering for large-scale directed networks

Directed networks are generally used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose computational challenge to it. In this paper, we leverage randomized sketching techniques to accelerate the spectral co-clustering algorithms in order to co-cluster large-scale directed networks more efficiently. Specifically, we derive two series of randomized spectral co-clustering algorithms, one is random-projection-based and the other is random-sampling-based. Theoretically, we analyze the resulting algorithms under two generative models– the stochastic co-block model and the degree corrected stochastic co-block model. The approximation error rates and misclustering error rates are established, which indicate better bounds than the state-of-the-art results of co-clustering literature. Numerically, we conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to tens of millions of nodes.

## Authors

• 9 publications
• 5 publications
• 9 publications
• 10 publications
• ### Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Spectral clustering has been one of the widely used methods for communit...
01/20/2020 ∙ by Hai Zhang, et al. ∙ 0

• ### Co-clustering for directed graphs: the Stochastic co-Blockmodel and spectral algorithm Di-Sim

Directed graphs have asymmetric connections, yet the current graph clust...
04/10/2012 ∙ by Karl Rohe, et al. ∙ 0

• ### Hermitian matrices for clustering directed graphs: insights and applications

Graph clustering is a basic technique in machine learning, and has wides...
08/06/2019 ∙ by Mihai Cucuringu, et al. ∙ 4

• ### Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic graphs

We propose two spectral algorithms for partitioning nodes in directed gr...
05/02/2018 ∙ by H. Van Lierde, et al. ∙ 0

• ### DIGRAC: Digraph Clustering with Flow Imbalance

Node clustering is a powerful tool in the analysis of networks. Here, we...
06/09/2021 ∙ by Yixuan He, et al. ∙ 0

• ### Measuring the Clustering Strength of a Network via the Normalized Clustering Coefficient

In this paper, we propose a novel statistic of networks, the normalized ...
08/01/2019 ∙ by Ting Li, et al. ∙ 0

• ### Network Summarization with Preserved Spectral Properties

Large-scale networks are widely used to represent object relationships i...
02/13/2018 ∙ by Yu Jin, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Recent advances in computing and measurement technologies have led to an explosion of large-scale network data. Networks can describe symmetric (undirected) or asymmetric (directed) relationships among interacting units in various fields, ranging from biology and informatics to social science and finance [38][18]. To extract knowledge from complex network structures, many clustering techniques, also known as community detection algorithms, are widely used to group together nodes with similar patterns [15]. In particular, as asymmetric relationships are essential to the organization of networks, clustering directed networks is receiving more and more attentions [9][3][47][11]. For large-scale directed network data, an appealing clustering algorithm should have not only the statistical guarantee but also the computational advantage.

To accommodate and explore the asymmetry in directed networks, the notion of co-clustering was introduced in [47][11], and such an idea can be traced back to [22]. Let be the network adjacency matrix, where , and if and only if there is an edge from node to node . Then the th row and column of represent the outgoing and incoming edges for node , respectively. Co-clustering refers to simultaneously clustering both the rows and the columns of , so that the nodes in a row cluster share similar sending patterns, and the nodes in a column cluster share similar receiving patterns. Compared to the standard clustering where only one set of clusters is obtained, co-clustering a directed network yields two possibly different sets of clusters, which provide more insights and improve our understandings on the organization of directed networks.

Spectral clustering [50] is a natural and interpretable algorithm to group undirected network data, which first performs the eigen decomposition on a matrix representing the network, for example, the adjacency matrix , and then runs

-means or other similar algorithms to cluster the resulting leading eigenvectors. Considering the asymmetry in directed networks, the standard spectral clustering algorithm has been modified to the

spectral co-clustering

, in which the eigen decomposition is replaced by the singular value decomposition (SVD), and the

-means is implemented on the left and right leading singular vectors, respectively. As the leading left and right singular vectors approximate the row and column spaces of

, it is expected that the resulting two sets of clusters contain nodes with similar sending and receiving patterns, respectively. A concrete version of the aforementioned algorithm is introduced in [47].

The spectral co-clustering is easy to be implemented and has been shown to have many nice properties [50][47]. However, large-scale directed networks, namely, networks with a large number of nodes or dense edges, pose great challenges to the computation of SVD. Therefore, how to improve the efficiency of spectral co-clustering while maintaining a controllable accuracy becomes an urgent need. In this paper, we consider the problem of co-clustering large-scale directed networks based on randomization techniques, a popular approach to reducing the size of data with limited information loss [34][55][13]

. Randomization techniques have been widely used in machine learning to speed up fundamental problems such as the least squares regression and low rank matrix approximation (see

[12][36][37][40][10][21][35]

, among many others). The basic idea is to compromise the size of a data matrix (or tensor) by sampling a small subset of the matrix entries, or forming linear combinations of the rows or columns. The entries or linear combinations are carefully chosen to preserve the major information contained in the matrix. Hence, randomization techniques provide a beneficial way to aiding the spectral co-clustering of large-scale directed network data.

For a network with community structure, its adjacency matrix is low-rank in nature, so the randomization for low-rank matrix approximation can be readily used to accelerate the SVD of [21][35][54]. We investigate two specific strategies, namely, the random-projection-based and the random-sampling-based SVD. The random projection strategy uses some amounts of randomness to compress the original matrix into a smaller matrix, whose rows (columns) are linear combinations of the rows (columns) of . In this way, the dimension of is largely reduced, and the corresponding SVD is thus sped up. As for the random sampling strategy, the starting point is that there exist fast iterative algorithms to compute the partial SVD of a sparse matrix, such as orthogonal iteration and Lanczos iteration [1][4], whose time complexity is generally proportional to the number of non-zero elements of . Therefore, a good way to accelerating the SVD of is to first sample the elements of to obtain a sparser matrix, and then use fast iterative algorithms to compute its SVD. As a whole, the spectral co-clustering with the classical SVD therein replaced by the randomized SVD is called the randomized spectral co-clustering.

Given the fast randomization techniques, it is also critical to study the statistical accuracy of the resulting algorithms under certain generative models. To this end, we assume the directed network is generated from the stochastic co-block model or the degree-corrected stochastic co-block model [47]

. These two models assume the nodes are partitioned into two sets of non-overlapping blocks, one corresponding to the row cluster and the other to the column cluster. Generally, nodes in the same row (column) block are stochastic equivalent senders (receivers). That is to say, two nodes send out (receive) an edge to (from) a third node with the same probability if these two nodes are in the same row (column) cluster. The difference of these two models lies in that the degree-corrected model

[25][47] considers the degree heterogeneity arising in real networks. The statistical error of the randomized spectral co-clustering is then studied under these two settings.

The main contributions of the paper are summarized as follows.

• We develop two new fast spectral co-clustering algorithms based on randomization techniques, namely, the random-projection-based and the random-sampling-based spectral co-clustering, to analyze large-scale directed networks. In particular, the proposed algorithms are applicable to networks without and with degree heterogeneity.

• We analyze the true singular vector structure of population adjacency matrices generated by the stochastic co-block model and the degree-corrected stochastic co-block model. The results explain why the spectral co-clustering algorithms work well for directed networks and provide insights on designing the co-clustering algorithms for networks with and without degree heterogeneity.

• We theoretically study the approximation performance and the clustering performance of the two randomization schemes under the assumption of the two block models. It turns out that under mild conditions, the approximation error rates are consistent with those in without randomization, and the misclustering error rates are better than those in [47] and [46], although the latter two are not established in the randomization scheme. This is because the technical tools that we use to bound the misclustering error are different from those in [47] and [46]. As the number of nodes goes to infinity, the misclustering rate goes to zero. In addition, different from undirected networks, the clustering difficulties of directed networks with respect to the rows and columns are distinct, which is partially caused by the asymmetry. Generally, clustering the side with a smaller target rank are easier than the other side.

The remainder of the paper is organized as follows. Section II introduces the randomized spectral co-clustering algorithms for co-clustering large-scale directed network. Section III includes the theoretical analysis of the proposed algorithms under two network models. Section IV reviews and discusses related works. Section V and VI present the experimental results of simulations and real-world data, respectively. Section VII concludes the paper. Technical proofs are all included in the Appendix.

## Ii Randomized spectral co-clustering

In this section, we first review two spectral co-clustering algorithms for directed networks without and with degree heterogeneity, respectively. Then we use randomization techniques to derive their corresponding randomized algorithms.

As mentioned earlier, co-clustering aims to find two possibly different set of clusters, namely, co-clusters, to describe and understand the sending pattern and receiving pattern of nodes, respectively. Recall the definition of the adjacency matrix , the th column and th row reveal the receiving pattern and sending pattern of node , respectively. Hence the co-clusters are called row clusters and column clusters, respectively.

Suppose there are row clusters and column clusters, and without loss of generality, assume . Write the partial SVD of as , where and . The left singular and the right singular vector approximate the row space and column space of , respectively. On the other hand, one can see that and are the eigenvectors of two symmetric matrices and whose entries correspond to the number of common children and the number of common parents of nodes and , respectively. Therefore, and contain the sending and receiving information of each node. Clustering and respectively would definitely yield clusters with nodes sharing similar sending and receiving patterns.

Based on the above explanations, one can see that the well-known spectral clustering is a good paradigm for co-clustering directed networks. We here consider two algorithms. One is based on the standard spectral clustering. It first computes the SVD of , and then use -means to cluster the left and right singular vectors of , respectively. See Algorithm 1. This algorithm is well-suited to networks whose nodes have approximately equal degree. While for networks whose nodes have heterogeneous degree, we consider the following strategy. We first compute the SVD of , and then normalize the non-zero rows of the left and right singular vectors such that the resulting rows have euclidian norm 1. -median clustering is then performed on the normalized rows of the left singular vectors and the right singular vectors, respectively. After that, the zero rows of singular vectors are randomly assigned to an existing cluster. See Algorithm 2. The normalization step aims to balance the importance of each node to facilitate the subsequent clustering procedures. And as we will see in the next section, it is essential for co-clustering networks with degree heterogeneity. The

-median clustering is used partially due to its robustness to outliers, which deals with the sum of norms instead of sum of squared norms.

Now we discuss the time complexity of Algorithm 1 and 2. It is well-known that the classical full SVD generally takes time which is time consuming when is large. Indeed, only the partial SVD of is needed, which can be done by fast iterative methods. See [1][4] for example. They generally takes time, where is the iteration number corresponding to a certain error and it can be large when is large. While for the -means or

-median, it is well-known that finding their optimal solutions is NP-hard. Hence, efficient heuristic algorithms are commonly employed. In this paper, we use the Lloyd’s algorithm to solve the

-means whose time complexity is proportional to , and use the fast averaged stochastic gradient algorithm to solve the -median [5]. Although these two algorithms are not guaranteed to converge to the global solutions, we assume in the theoretical analysis that they could find the optimal solutions for simplicity. Alternatively, one can use a more delicate -approximate -means [27] to bridge such gap, where a good approximate solution can be found within a constant fraction of the optimal value. Based on the above discussions, the time complexity of Algorithm 1 and 2 are dominated by the SVD, which encourages us to make use of the randomization techniques to speed up the computation of SVD for further improving the spectral co-clustering.

### Ii-a Random-projection-based spectral co-clustering

In this subsection we first introduce how to leverage randomized sketching techniques to accelerate SVD, and then based on that we establish the random-projection-based spectral co-clustering algorithm.

The basic idea of the random-projection-based SVD is to compress the adjacency matrix into a smaller matrix, and then apply a standard SVD to the compressed one, thus saving the computational cost. The approximate SVD of the original can be recovered by postprocessing the SVD of the smaller matrix [21][35][54].

For an asymmetric matrix with a target rank , the objective is to find orthonormal bases such that

 A≈QQ⊺ATT⊺:=Arp.

It is not hard to see that projects the column vectors of to the column space of , and projects the row vectors of to the row space of (or the column space of ). Therefore, and approximate the column and row spaces of , respectively. In randomization methods, and can be built via random projection [21]. Take as an example, one first constructs an random matrix whose columns are random linear combinations of the columns of , and then orthonormalizes the

columns using the QR decomposition to obtain the orthonormal matrix

. Once and are constructed, we can perform the standard SVD on , and the approximated SVD of can be achieved by left multiplying and right-multiplying . The whole procedure of the random-projection-based SVD can be summarized as the following steps:

• Step 1: Construct two test matrices with independent standard normal entries.

• Step 2: Obtain and via the QR decomposition and .

• Step 3: Compute SVD of

• Step 4: Output the approximate SVD of as , where and .

The random-projection-based spectral co-clustering refers to Algorithm 1 or 2 with the SVD therein replaced with the above random-projection-based SVD.

The oversampling and the power iteration scheme are two strategies to improve the performance of the randomized SVD [21][35]. The oversampling uses extra and ( and in total) random projections to form the sketch matrices and in Step 2. Such strategy reduces the information loss when the rank of is not exactly . The power iteration scheme employs and instead of and in Step 2. This treatment improves the quality of the sketch matrix when the singular values of are not rapidly decreasing.

The time complexity of the random-projection-based SVD is dominated by the matrix multiplication operations in Step 2 which generally takes time. Note that the classical SVD in Step 3 is cheap as the matrix dimension as low as . Alternatively, one can implement partial SVD to find only singular vectors of . In addition, the time of Step 2 can be further improved if one uses structured random test matrix or performs the matrix multiplication in parallel. The random-projection-based SVD is numerically stable, and it comes with good theoretical guarantee [21][35][54].

### Ii-B Random-sampling-based spectral co-clustering

In this subsection, we first introduce the accelerated SVD based on the random sampling technique and then define the corresponding spectral co-clustering algorithm.

Note that real networks are often sparse [53][7]. Specifically, the number of non-zero elements in the adjacency matrix is generally with . And we know that the time complexity of fast iterative algorithms of SVD is proportional to the number of non-zero elements of matrix [1][4]. Thus it is efficient to find the leading singular vectors of using iterative methods when is really sparse. The random-sampling-based SVD is designed to make this procedure more efficient. The general idea is to first sample the elements of randomly to obtain an random sparsified matrix of . Then, use a fast iterative algorithm to compute the leading singular vectors of the sparsified matrix. The SVD of can then approximated as the partial SVD of the sparsified matrix.

We use a simple strategy to construct the sparsified matrix . That is, each element of is sampled with equal probability , and the elements that are not sampled are forced to be zero. Formally, for each pair of ,

 Arsij={Aijp,if (i,j) is % selected,0,if (i,j) is not selected.

If the sampling probability is not too small, then is close to . With the sparsified matrix at hand, one can use fast iterative algorithms such as [1][4] to compute its leading singular vectors. We summarize the whole procedures of the random-sampling-based SVD as the following steps:

• Step 1: Form the sparsified matrix via (II-B).

• Step 2: Compute the partial SVD of using the fast iterative algorithm in [1] or [4] such that .

The random-sampling-based spectral co-clustering refers to Algorithm 1 or 2 with the SVD therein replaced with the above random-sampling-based SVD.

The time complexity of Step 1 and Step 2 are approximately and , where denotes the number of non-zero elements in , and denotes number of iterations. Generally, the number of edges in a network is far below . Hence the random-sampling-based SVD is rather efficient.

## Iii Theoretical analysis

In this section, we analyze the theoretical properties of the randomized spectral co-clustering algorithms under two generative models. One is the stochastic co-block model (ScBM) and the other is the degree-corrected co-block model (DC-ScBM) [47].

These two models are built upon the notion of co-clustering in directed networks. Nodes are partitioned into two underlying sets of clusters. One corresponds to the row clusters and the other corresponds to the column clusters. The row and column clusters are possibly different and even have different number of clusters. Generally, nodes in a common row cluster are stochastic equivalent senders in the sense that they send out an edge to a third node with equal probabilities. Similarly, nodes in a common column cluster are stochastic equivalent receivers in the sense that they receive an edge from a third node with equal probabilities.

Before giving the formal definitions of these two models, we now provide some notes and notation. Suppose there exists row clusters and column clusters in the directed network . Without loss of generality, we assume throughout this paper that . For , and denote assignments of the row cluster and column cluster of node , respectively. We can also represent the cluster assignments using membership matrices. Let be the set of all matrices that have exactly one 1 and 0’s in each row. is called a row membership matrix if node belongs to row cluster if and only if . Similarly, is called a column membership matrix if node belongs to column cluster if and only if . For , let be the cluster of nodes that belongs to row cluster , and denote its size . Similarly, for , let be the cluster of nodes that belongs to column cluster , and denote its size . For any matrix and proper index sets and , and denote the sub-matrices of that consist of the rows in and column in , respectively. , , and denote the Frobenius norm, spectral norm, and the element-wise maximum absolute value of . denotes a diagonal matrix with its diagonal entries being the same with those of .

### Iii-a The stochastic co-block model

The stochastic co-block models are defined as follows,

###### Definition 1 (Stochastic co-block model [47])

Let and be the row and column membership matrices, respectively. Let be the connectivity matrix whose th element is the probability of a directed edge from any node in the row cluster to any node in the column cluster . Given , each element of the network adjacency matrix is generated independently as if , and if .

Define and denote its maximum and minimum non-zero singular values by and , respectively. We assume throughout this paper that and . It is easy to see that is the population version of in the sense that . The next lemma reveals the structure of the singular vectors in .

###### Lemma 1

Consider a ScBM parameterized by , , and . is the population matrix with its SVD being . Then for ,

(1) if and only if , and for any , .

(2) implies . Moreover, if the columns of are distinct, then the opposite direction holds and as a result, if and only if . Further, for any , , where recall that is the maximum singular value of .

Lemma 1 provides the following important facts for us. The left singular vectors of reveals the true row clusters in the sense that two rows of are identical if and only if the corresponding nodes are in the same row cluster. And the corresponding row distance of two nodes in distinct row clusters is determined by the number of nodes in the underlying row clusters. While the story for the column clusters is slightly different. Nodes in different column clusters correspond to different rows in . While the opposite side does not always hold except that the columns of are distinct. In addition, a larger column distance of would possibly lead to a larger row distance of . Based on these facts, one can imagine that the spectral co-clustering algorithms 1 and 2 would estimate the true underlying clusters well if the singular vectors of are close enough to that of , which by the Davis-Kahan-Wedin theorem [39] would hold if and are close in some sense.

Next, we proceed to evaluate the clustering performance of randomized spectral co-clustering algorithms. To that end, we first evaluate the deviation of the randomized adjacency matrices and from . Then, we examine the clustering performance of randomized algorithms. We discuss the random projection and random sampling schemes, respectively.

#### Iii-A1 Random-projection-based spectral co-clustering in ScBMs

We refer to Algorithm 1 with the SVD replaced by the random-projection-based SVD. The next theorem provides the spectral deviation of from .

###### Theorem 1

Let be the randomized approximation of in the random projection scheme where the target rank is , the oversampling parameters satisfy and , and the test matrices have i.i.d. standard gaussian entries. If

 maxklBkl≤αnforsomeαn≥c0logn/n, (C1)

then for any , there exists a constant such that

 ∥Arp−P∥2≤c1√nαn, (4.1)

with probability at least .

Theorem 1 implies that the randomized adjacency matrix concentrates around at the rate of , which can be regarded as the upper bound of the expected degree in the network . The condition (C1) is weak condition on the sparsity of network . The bound (4.1) is the same with best concentration bound of [29][16], to the best of our knowledge. Hence in this sense, the random projection pays little price under the framework of ScBMs.

The next theorem provides an upper bound for the proportion of misclustered nodes.

###### Theorem 2

Let and be the estimated membership matrices of the randomized projection-based spectral co-clustering algorithm. The other parameters are the same with those in Theorem 1. Suppose (C1) holds and recall that the maximum and minimum nonzero singular values of are and . The following two results hold for and , respectively.

(1) Define

 τ=minl≠k√(nyk)−1+(nyl)−1.

There exists an absolute constant such that, if

 Kyαnnnykτ2γ2n≤c2, (C2)

for any , then with probability larger than for any , there exists a subset such that

 |My|n≤c−12Kyαnτ2γ2n. (4.2)

And for , there exists a permutation matrix such that

 YrpTy∗Jy=YTy∗. (4.3)

(2) Define

 δ=minl≠k∥B∗k−B∗l∥2⋅mini=1,...,Ky(nyi)1/2/σn.

There exists an absolute constant such that, if

 Kyαnnnzkδ2γ2n≤c3, (C3)

for any , then with probability larger than for any , there exists a subset such that

 |Mz|n≤c−13Kyαnδ2γ2n. (4.4)

And for , there exists a permutation matrix such that

 ZrpTz∗Jz=ZTz∗. (4.5)

Theorem 2 provides upper bounds for the misclustering rates with respect to row clusters and column clusters, as indicated in (4.2) and (4.4). Recalling Lemma 1, we can see that the clustering performance depends on the minimum row distances and of the population singular vectors and . As expected, larger distances imply more accurate clusters. (4.3) and (4.5) imply that nodes in and are correctly clustered into the underlying row clusters and column clusters up to permutations, respectively. (C2) and (C3) are technical conditions that ensure the validity of the results. They ensure that the number of misclustered nodes with respect to the row clusters and column clusters are smaller than the minimum number of the true cluster sizes and . That is to say, each true cluster has nodes that are correctly clustered. Generally, they are easy to be satisfied as long as the misclustering rates indicated in (4.2) and (4.4) are of smaller order than and , respectively. In addition, when , it can be inferred from Lemma 1 that the column clusters perform the same with row clusters. So the RHS (4.4) can be improved to that in (4.2). In the next section, we will examine the bounds in (4.2) and (4.4) and those that follow explicitly and compare them with those in [46][47].

#### Iii-A2 Random-sampling-based spectral co-clustering in ScBMs

We refer to Algorithm 1 with the SVD replaced by the random-sampling-based SVD. The next theorem provides the deviation of from in the sense of the spectral norm.

###### Theorem 3

Let be the randomized approximation of in the random sampling scheme where the sampling probability is . Suppose assumption (C1) holds, then for any and , there exist constants and such that

 (4.7)

where

 Δ(n,αn,p):=√nα2np(1+p1/4⋅max(1,√1p−1)),

with probability larger than .

Theorem 3 says that converges to at the rate indicated in (4.7). As expected, the rate decreases as increases. Note that (4.7) simplifies to provided that , which can be further reduced to if .

In what follows, we use to denote the bound in (4.7) as,

 Φ(n,p,αn):=max{√nαnp,√lognp,Δ(n,αn,p)}.

The next theorem provides an upper bound for the misclustering rates of the random-sampling-based spectral co-clustering.

###### Theorem 4

Let and be the estimated membership matrices of the randomized sampling-based spectral co-clustering algorithm. The other parameters are the same with those in Theorem 3. Suppose (C1) holds and recall that the minimum nonzero singular value of is . and are defined the same with those in Theorem 2. The following two results hold for and , respectively.

(1) There exists an absolute constant such that, if

 KyΦ2(n,p,αn)nykτ2γ2n≤c6, (C4)

for any , then with probability larger than for any , there exist subsets such that

 |My|n≤c−16KyΦ2(n,p,αn)nτ2γ2n. (4.8)

And for , there exists a permutation matrix such that

 YrsTy∗Jy=YTy∗. (4.9)

(2) There exists an absolute constant such that, if

 KyΦ2(n,p,αn)nzkδ2γ2n≤c7, (C5)

for any , then with probability larger than for any , there exist subsets such that

 |Mz|n≤c−17KyΦ2(n,p,αn)nδ2γ2n. (4.10)

And for , there exists a permutation matrix such that

 ZrsTz∗Jz=ZTz∗. (4.11)

The proof of Theorem 4 is similar to that of Theorem 2, hence we omit it. (4.8) and (4.10) provide upper bounds for the proportion of the misclustered nodes in the estimated row clusters and column clusters, respectively. As in the random projection scheme, the minimum non-zero row distance in the true singular vectors and , i.e., and , play an important role in the clustering performance. The clustering difficulty in the population level reveals that in the sample level. The nodes outside and are correctly clustered up to permutations (see (4.9) and (4.11)). (C4) and (C5) are technical conditions which ensure the validity of the results. They have the same effect with those in (C2) and (C3), and they are easy to be achieved. In addition, the RHS (4.11) can be further improved to that in (4.9) as long as .

### Iii-B The degree corrected stochastic co-block model

In the ScBM, the nodes within each row cluster and column cluster are stochastic equivalent. While in real networks, there exists hubs whose edges are far more than those of the non-hub nodes. To model such degree heterogeneity, the degree corrected stochastic co-block model introduces extra parameters and , which represent the propensity of each node to send and receive edges. The DC-ScBMs are formally defined as follows,

###### Definition 2 (Degree-corrected stochastic co-block model [47])

Let and be the row and column membership matrices, respectively. Let be the connectivity matrix whose th element is the probability of a directed edge from any node in the row cluster to any node in the column cluster . Let and be the node propensity parameters. Given , each element of the network adjacency matrix is generated independently as if , and if .

Therefore, the probability of an edge from node to depends on not only the row cluster and column cluster they respectively lie in, but also the propensity of them to send and receive edges, respectively. Note that and would bring the problem of identifiability except that additional assumptions are enforced. In this paper, we assume and for each and , respectively. Define , where recall that we assume and , and denote the maximum and minimum of its singular values by and , respectively. is actually the population version of in the sense that . Before analyzing the singular structure of , we now introduce some notations. Let and be vectors that consistent with and respectively on and and zero otherwise. Thus, and . Let , and . Define and be vectors such that the th elements are and , respectively. The next lemma shows the singular structure in .

###### Lemma 2

Consider a DC-ScBM parameterized by , , , and . is the population matrix with its SVD being . Then for ,

(1) for , where is an orthonormal matrix. So for any , .

(2) for , where is an matrix with orthonormal columns. And for any , , where for any two vectors and , is defined to be .

The directions of two rows in or are the same if and only if the corresponding nodes lie in the same row cluster or column cluster. For example, if node and node are in the same row cluster , then and both have direction . But the angles between each couple of direction tell different story for the row clusters and column clusters. For the row side, two rows of are perpendicular if the corresponding nodes lie in different row clusters. While for the column side, the angle between two rows of that correspond to different column clusters depends generally on the direction between the corresponding columns in a “normalized” connectivity matrix , where is defined earlier. Except these facts, Lemma 2 essentially explains why a normalization step is needed in Algorithm 2 before the -median. It is well-known that -median or -means clusters nodes together if they are close in the sense of Euclidean distance. The normalization step forces any two rows of or to lie in the same position if the corresponding nodes are in the same row cluster or column cluster. In such way, the -median or -means could succeed when applied to the sample version singular vectors.

In Theorem 1 and 3, we have proved that the randomized adjacency matrices and concentrate around the population under the ScBMs, where we actually did not make use of the explicit structure of but only the facts that is the population of , and is of rank . Hence the same results hold here for the DC-ScBMs. Next, we use this results combining with Lemma 2 to analyze the misclustering performance of the randomized spectral co-clustering algorithms. We deal with the random projection and random sampling schemes, respectively.

#### Iii-B1 Random projection

We refer to Algorithm 2 with the SVD replaced by the random-projection-based SVD. The next theorem provides the misclustering rates of the random-projection-based algorithm.

###### Theorem 5

Let and be the estimated membership matrices of the random-projection-based spectral co-clustering algorithm. The other parameters are the same with those in Theorem 1. Suppose (C1) holds and recall that the minimum nonzero singular value of is . The following two results hold for and , respectively.

(1) Define

 κyk:=(nyk)−2∑i∈Gyk(~θyi)−2.

There exists an absolute constant such that, if

 √∑Kyk=1(nyk)2κyk√Kyαnnγnnyk≤c8, (C6)

for any , then with probability larger than for any there exist subsets such that

 |My|n≤c−18√∑Kyk=1(nyk)2κyk√Kyαnγn√n. (4.13)

And for , there exists a permutation matrix such that

 YrpTy∗Jy=YTy∗. (4.14)

(2) Define

 κzk:=(nzk)−2∑i∈Gzk(~θzi)−2∥(~B∗k)⊺¯Σ−1∥−22,

and

 η(P)=maxgi≠gjcos((~B∗gi)⊺¯Σ−1,(~B∗gj)⊺¯Σ−1).

There exists an absolute constant such that, if

 √∑Kzk=1(nzk)2κzk√Kyαnn√1−η(P)γnnzk≤c9, (C7)

for any , then with probability larger than for any there exist subsets such that

 |Mz|n≤c−19√∑Kzk=1(nzk)2κzk√Kyαn√1−η(P)γn√n. (4.15)

And for , there exists a permutation matrix such that

 ZrpTz∗Jz=ZTz∗. (4.16)

The quantities and can be thought of as the node heterogeneities with respect to sending and receiving edges in each cluster , respectively. It can be shown that and the equality holds if the propensity of each node to send edges in row cluster is homogeneous. Similarly, and the equality holds if the propensity of each node to receive edges in column cluster is homogeneous. The quantity represents the minimum non-zero angles among the rows of the population singular vectors (see the result (2) of Lemma 2), in the sense of cosine. (4.13) and (4.15) provide upper bounds for the proportion of the misclustered nodes in the estimated row clusters and column clusters, respectively. It can be seen that a larger sum of node degree heterogeneity (normalized by the number of nodes and ) may lead to poorer misclustering performance. And different from the row clusters, the performance of the estimated column clusters also depend on . As expected, the larger the angle between the rows of population singular vectors , the easier the clustering procedure is. (4.14) and (4.16) indicate that nodes lying in and are correctly clustered into the underlying row clusters and column clusters up to the permutation. (C6) and (C7) are conditions which ensure that each true cluster has nodes that are correctly clustered. They are easy to be met. In addition, -median is used technically in DC-ScBMs which facilitates the controlling of zero rows in the sample version singular vectors.

#### Iii-B2 Random sampling

We refer to Algorithm 2 with the SVD replaced by the random-sampling-based SVD. The next theorem provides the misclustering rates of the random-sampling-based algorithm.

Let and