    # Spectral embedding of regularized block models

Spectral embedding is a popular technique for the representation of graph data. Several regularization techniques have been proposed to improve the quality of the embedding with respect to downstream tasks like clustering. In this paper, we explain on a simple block model the impact of the complete graph regularization, whereby a constant is added to all entries of the adjacency matrix. Specifically, we show that the regularization forces the spectral embedding to focus on the largest blocks, making the representation less sensitive to noise or outliers. We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Spectral embedding is a standard technique for the representation of graph data (Ng et al., 2002; Belkin & Niyogi, 2002). Given the adjacency matrix

of the graph, it is obtained by solving either the eigenvalue problem:

 LX=XΛ, with XTX=I, (1)

or the generalized eigenvalue problem:

 LX=DXΛ, with XTDX=I, (2)

where is the degree matrix, with

the all-ones vector of dimension

, is the Laplacian matrix of the graph, is the diagonal matrix of the smallest (generalized) eigenvalues of and

is the corresponding matrix of (generalized) eigenvectors. In this paper, we only consider the generalized eigenvalue problem, whose solution is given by the spectral decomposition of the normalized Laplacian matrix

(Luxburg, 2007).

The spectral embedding can be interpreted as equilibrium states of some physical systems (Snell & Doyle, 2000; Spielman, 2007; Bonald et al., 2018)

, a desirable property in modern machine learning. However, it tends to produce poor results on real datasets if applied directly on the graph

(Amini et al., 2013). One reason is that real graphs are most often disconnected due to noise or outliers in the dataset.

In order to improve the quality of the embedding, two main types of regularization have been proposed. The first artificially increases the degree of each node by a constant factor (Chaudhuri et al., 2012; Qin & Rohe, 2013), while the second adds a constant to all entries of the original adjacency matrix (Amini et al., 2013; Joseph et al., 2016; Zhang & Rohe, 2018). In the practically interesting case where the original adjacency matrix is sparse, the regularized adjacency matrix is dense but has a so-called sparse low rank structure, enabling the computation of the spectral embedding on very large graphs (Lara, 2019).

While (Zhang & Rohe, 2018) explains the effects of regularization through graph conductance and (Joseph et al., 2016) through eigenvector perturbation on the Stochastic Block Model, there is no simple interpretation of the benefits of graph regularization. In this paper, we show on a simple block model that the complete graph regularization forces the spectral embedding to separate the blocks in decreasing order of size, making the embedding less sensitive to noise or outliers in the data.

Indeed, (Zhang & Rohe, 2018) identified that, without regularization, the cuts corresponding to the first dimensions of the spectral embedding tend to separate small sets of nodes, so-called dangling sets, loosely connected to the rest of the graph. Our work shows more explicitly that regularization forces the spectral embedding to focus on the largest clusters. Moreover, our analysis involves some explicit characterization of the eigenvalues, allowing us to quantify the impact of the regularization parameter.

The rest of this paper is organized as follows. Section 2 presents block models and an important preliminary result about their aggregation. Section 3 presents the main result of the paper, about the regularization of block models, while Section 4 extends this result to bipartite graphs. Section 5 presents the experiments and Section 6 concludes the paper.

## 2 Aggregation of Block Models

Let be the adjacency matrix of an undirected, weight graph, that is a symmetric matrix such that if and only if there is an edge between nodes and , with weight . Assume that the nodes of the graph can be partitioned into blocks of respective sizes so that any two nodes of the same block have the same neighborhood, i.e., the corresponding rows (or columns) of are the same. Without any loss of generality, we assume that the matrix has rank . We refer to such a graph as a block model.

Let be the associated membership matrix, with if index belongs to block and otherwise. We denote by the diagonal matrix of block sizes.

Now define . This is the adjacency matrix of the aggregate graph, where each block of the initial graph is replaced by a single node; two nodes in this graph are connected by an edge of weight equal to the total weight of edges between the corresponding blocks in the original graph. We denote by the degree matrix and by the Laplacian matrix of the aggregate graph.

The following result shows that the solution to the generalized eigenvalue problem (2) follows from that of the aggregate graph:

###### Proposition 1.

Let be a solution to the generalized eigenvalue problem:

 Lx=λDx. (3)

Then either and or where is a solution to the generalized eigenvalue problem:

 ¯Ly=λ¯Dy. (4)
###### Proof.

Consider the following reformulation of the generalized eigenvalue problem (3):

 Ax=Dx(1−λ). (5)

Since the rank of is equal to , there are eigenvectors associated with the eigenvalue , each satisfying . By orthogonality, the other eigenvectors satisfy for some vector . We get:

 AZy=DZy(1−λ),

so that

 ¯Ay=¯Dy(1−λ).

Thus is a solution to the generalized eigenvalue problem (4). ∎

## 3 Regularization of Block Models

Let be the adjacency matrix of some undirected graph. We consider a regularized version of the graph where an edge of weight is added between all pairs of nodes, for some constant . The corresponding adjacency matrix is given by:

 Aα=A+αJ,

where is the all-ones matrix of same dimension as . We denote by the corresponding degree matrix and by the Laplacian matrix.

We first consider a simple block model where the graph consists of disjoint cliques of respective sizes nodes, with . In this case, we have , where is the membership matrix.

The objective of this section is to demonstrate that, in this setting, the -th dimension of the spectral embedding isolates the largest cliques from the rest of the graph, for any

###### Lemma 1.

Let be the eigenvalues associated with the generalized eigenvalue problem:

 Lαx=λDαx. (6)

We have .

###### Proof.

Since the Laplacian matrix is positive semi-definite, all eigenvalues are non-negative (Chung, 1997). We know that the eigenvalue 0 has multiplicity 1 on observing that the regularized graph is connected. Now for any vector ,

 xTAαx=xTAx+αxTJx=||ZTx||2+α(1Tnx)2≥0,

so that the matrix is positive semi-definite. In view of (5), this shows that for any eigenvalue . The proof then follows from Proposition 1, on observing that the eigenvalue 1 has multiplicity . ∎

###### Lemma 2.

Let be a solution to the generalized eigenvalue problem (6) with . There exists some such that for each node in block ,

 sign(xi)=s⟺nj≥α1−λλn.
###### Proof.

In view of Proposition 1, we have where is a solution to the generalized eigenvalue problem of the aggregate graph, with adjacency matrix:

 ¯Aα=ZTAαZ=ZT(A+αJ)Z.

Since and , we have Using the fact that , we get with the all-ones matrix of dimension , so that:

 ¯Aα=W(IK+αJK)W,

where

is the identity matrix of dimension

. We deduce the degree matrix:

 ¯Dα=W(W+αnIK),

and the Laplacian matrix:

 ¯Lα=¯Dα−¯Aα=αW(nIK−JKW).

The generalized eigenvalue problem associated with the aggregate graph is:

 ¯Lαy=λ¯Dαy.

After multiplication by , we get:

 α(nIK−JKW)y=λ(W+αnIK)y.

Observing that , we conclude that:

 (αn(1−λ)−λW)y∝1K, (7)

and since ,

 ∀j=1,…,K,yj∝1λnj−α(1−λ)n. (8)

The result then follows from the fact that . ∎

###### Lemma 3.

The smallest eigenvalues satisfy:

 0=λ1<μ1<λ2<μ2<⋯<λK<μK,

where for all

 μj=αnαn+nj.
###### Proof.

We know from Lemma 1 that the smallest eigenvalues are in . Let be a solution to the generalized eigenvalue problem (6) with . We know that where is an eigenvector associated with the same eigenvalue for the aggregate graph. Since is an eigenvector for the eigenvalue 0, we have . Using the fact that , we get:

 K∑j=1nj(nj+αn)yj=0.

We then deduce from (7) and (8) that and

 K∑j=1nj(nj+αn)1λ/μj−1=0.

This condition cannot be satisfied if or as the terms of the sum would be either all positive or all negative.

Now let be another eigenvector for the aggregate graph, with , for the eigenvalue . By the same argument, we get:

 K∑j=1nj(nj+αn)yjy′j=0,

and

 K∑j=1nj(nj+αn)1λ/μj−11λ′/μj−1=0.

with . This condition cannot be satisfied if and are in the same interval for some as the terms in the sum would be all positive. There are eigenvalues in for such intervals, that is one eigenvalue per interval. ∎

The main result of the paper is the following, showing that the largest cliques of the original graph can be recovered from the spectral embedding of the regularized graph in dimension .

###### Theorem 1.

Let be the spectral embedding of dimension , as defined by (2), for some in the set . Then gives the largest blocks of the graph.

###### Proof.

Let be the -th column of the matrix , for some . In view of Lemma 3, this is the eigenvector associated with eigenvalue , so that

 α1−λjλjn∈(nj−1,nj).

In view of Lemma 2, all entries of corresponding to blocks of size have the same sign, the other having the opposite sign. ∎

Theorem 1 can be extended in several ways. First, the assumption of distinct block sizes can easily be relaxed. If there are distinct values of block sizes, say blocks of sizes , there are distinct values for the thresholds and thus distinct values for the eigenvalues in , the multiplicity of the -th smallest eigenvalue being equal to . The spectral embedding in dimension still gives cliques of the largest sizes.

Second, the graph may have edges between blocks. Taking for instance, for some parameter , the results are exactly the same, with replaced by . A key observation is that regularization really matters when , in which case the initial graph becomes disconnected and, in the absence of regularization, the spectral embedding may isolate small connected components of the graph. In particular, the regularization makes the spectral embedding much less sensitive to noise, as will be demonstrated in the experiments.

Finally, degree correction can be added by varying the node degrees within blocks. Taking , for some arbitrary diagonal matrix with positive entries, similar results can be obtained under the regularization . Interestingly, the spectral embedding in dimension then recovers the largest blocks in terms of normalized weight, the ratio of the total weight of the block to the number of nodes in the block.

## 4 Regularization of Bipartite Graphs

Let be the biadjacency matrix of some bipartite graph with respectively nodes in each part, i.e., if and only if there is an edge between node in the first part of the graph and node in the second part of the graph, with weight . This is an undirected graph of nodes with adjacency matrix:

 A=[0BBT0]

The spectral embedding of the graph (2) can be written in terms of the biadjacency matrix as follows:

 {BX2=D1X1(I−Λ)BTX1=D2X2(I−Λ) (9)

where are the embeddings of each part of the graph, with respective dimensions and , and . In particular, the spectral embedding of the graph follows from the generalized SVD of the biadjacency matrix .

The complete regularization adds edges between all pairs of nodes, breaking the bipartite structure of the graph. Another approach consists in applying the regularization to the biadjacency matrix, i.e., in considering the regularized bipartite graph with biadjacency matrix:

 Bα=B+αJ,

where is here the all-ones matrix of same dimension as . The spectral embedding of the regularized graph is that associated with the adjacency matrix:

 Aα=[0BαBTα0] (10)

As in Section 3, we consider a block model so that the biadjacency matrix is block-diagonal with all-ones block matrices on the diagonal. Each part of the graph consists of groups of nodes of respective sizes and , with nodes of block in the first part connected only to nodes of block in the second part, for all .

We consider the generalized eigenvalue problem (6) associated with the above matrix . In view of (9), this is equivalent to the generalized SVD of the regularized biadjacency matrix . We have the following results, whose proofs are deferred to the appendix:

###### Lemma 4.

Let be the eigenvalues associated with the generalized eigenvalue problem (6). We have .

###### Lemma 5.

Let be a solution to the generalized eigenvalue problem (6) with . There exists such that for each node in block of part ,

 sign(xi)=sp⟺njmj(nj+αn)(mj+αm)≥1−λ.
###### Lemma 6.

The smallest eigenvalues satisfy:

 0=λ1<μ1<λ2<μ2<⋯<λK<μK,

where for all

 μj=1−njmj(nj+αn)(mj+αm).
###### Theorem 2.

Let be the spectral embedding of dimension , as defined by (2), for some in the set . Then gives the largest blocks of each part of the graph.

Like Theorem 1, the assumption of decreasing block sizes can easily be relaxed. Assume that block pairs are indexed in decreasing order of . Then the spectral embedding of dimension gives the first block pairs for that order. It is interesting to notice that the order now depends on : when , the block pairs of highest value (equivalently, highest harmonic mean of proportions of nodes in each part of the graph) are isolated first; when , the block pairs of highest value (equivalently, the highest geometric mean of proportions of nodes in each part of the graph) are isolated first.

The results also extend to non-block diagonal biadjacency matrices and degree-corrected models, as for Theorem 1.

## 5 Experiments

We now illustrate the impact of regularization on the quality of spectral embedding. We focus on a clustering task, using both synthetic and real datasets where the ground-truth clusters are known. In all experiments, we skip the first dimension of the spectral embedding as it is not informative (the corresponding eigenvector is the all-ones vector, up to some multiplicative constant). The code to reproduce these experiments is available online.

### 5.1 Toy graph

We first illustrate the theoretical results of the paper with a toy graph consisting of 3 cliques of respective sizes . We compute the spectral embeddings in dimension 1, using the second smallest eigenvalue. Denoting by the membership matrix, we get for , showing that the embedding isolates the largest cluster; this is not the case in the absence of regularization, where .

### 5.2 Datasets

This section describes the datasets used in our experiments. All graphs are considered as undirected. Table 1 presents the main features of the graphs.

#### Stochastic Block-Model (SBM)

We generate 100 instances of the same stochastic block model (Holland et al., 1983)

. There are 100 blocks of size 20, with intra-block edge probability set to

for the first 50 blocks and for the other blocks. The inter-block edge probability is set to Other sets of parameters can be tested using the code available online. The ground-truth cluster of each node corresponds to its block.

#### 20newsgroup (NG)

This dataset consists of around newsgroups posts on 20 topics. This defines a weighted bipartite graph between documents and words. The label of each document corresponds to the topic.

#### Wikipedia for Schools (WS)

(Haruechaiyasak & Damrongrat, 2008). This is the graph of hyperlinks between a subset of Wikipedia pages. The label of each page is its category (e.g., countries, mammals, physics).

### 5.3 Metrics

We consider a large set of metrics from the clustering literature. All metrics are upper-bounded by 1 and the higher the score the better.

#### Homogeneity (H), Completeness (C) and V-measure score (V)

(Rosenberg & Hirschberg, 2007). Supervised metrics. A cluster is homogeneous if all its data points are members of a single class in the ground truth. A clustering is complete if all the members of a class in the ground truth belong to the same cluster in the prediction. Harmonic mean of homogeneity and completeness.

(Hubert & Arabie, 1985). Supervised metric. This is the corrected for chance version of the Rand Index which is itself an accuracy on pairs of samples.

(Vinh et al., 2010) Supervised metric. Adjusted for chance version of the mutual information.

#### Fowlkes-Mallows Index (FMI)

(Fowlkes & Mallows, 1983)

. Supervised metric. Geometric mean between precision and recall on the edge classification task, as described for the ARI.

#### Modularity (Q)

(Newman, 2006). Unsupervised metric. Fraction of edges within clusters compared to that is some null model where edges are shuffled at random.

#### Normalized Standard Deviation (NSD)

Unsupervised metric. 1 minus normalized standard deviation in cluster size.

### 5.4 Experimental setup

All graphs are embedded in dimension 20, with different regularization parameters. To compare the impact of this parameter across different datasets, we use a relative regularization parameter , where is the total weight of the graph.

We use the K-Means algorithm with to cluster the nodes in the embedding space. The parameter

is set to the ground-truth number of clusters (other experiments with different values of are reported in the Appendix). We use the Scikit-learn (Pedregosa et al., 2011) implementation of K-Means and the metrics, when available. The spectral embedding and the modularity are computed with the Scikit-network package, see the documentation for more details.

### 5.5 Results

We report the results in Table 2 for relative regularization parameter . We see that the regularization generally improves performance, the optimal value of depending on both the dataset and the score function. As suggested by Lemma 3, the optimal value of the regularization parameter should depend on the distribution of cluster sizes, on which we do not have any prior knowledge.

To test the impact of noise on the spectral embedding, we add isolated nodes with self loop to the graph and compare the clustering performance with and without regularization. The number of isolated nodes is given as a fraction of the initial number of nodes in the graph. Scores are computed only on the initial nodes. The results are reported in Table 3 for the Wikipedia for Schools dataset. We observe that, in the absence of regularization, the scores drop even with only noise. The computed clustering is a trivial partition with all initial nodes in the same cluster. This means that the 20 first dimensions of the spectral embedding focus on the isolated nodes. On the other hand, the scores remain approximately constant in the regularized case, which suggests that regularization makes the embedding robust to this type of noise.

## 6 Conclusion and Perspectives

In this paper, we have provided a simple explanation for the well-known benefits of regularization on spectral embedding. Specifically, regularization forces the embedding to focus on the largest clusters, making the embedding more robust to noise. This result was obtained through the explicit characterization of the embedding for a simple block model, and extended to bipartite graphs.

An interesting perspective of our work is the extension to stochastic block models, using for instance the concentration results proved in (Lei et al., 2015; Le et al., 2017). Another problem of interest is the impact of regularization on other downstream tasks, like link prediction. Finally, we would like to further explore the impact of the regularization parameter, exploiting the theoretical results presented in this paper.

## References

• Amini et al. (2013) Arash A Amini, Aiyou Chen, Peter J Bickel, Elizaveta Levina, et al. Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41(4):2097–2122, 2013.
• Belkin & Niyogi (2002) Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems, pp. 585–591, 2002.
• Bonald et al. (2018) Thomas Bonald, Alexandre Hollocou, and Marc Lelarge. Weighted spectral embedding of graphs. In 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 494–501. IEEE, 2018.
• Chaudhuri et al. (2012) Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas. Spectral clustering of graphs with general degrees in the extended planted partition model. In Conference on Learning Theory, pp. 35–1, 2012.
• Chung (1997) Fan RK Chung. Spectral graph theory. American Mathematical Soc., 1997.
• Fowlkes & Mallows (1983) Edward B Fowlkes and Colin L Mallows.

A method for comparing two hierarchical clusterings.

Journal of the American statistical association, 78(383):553–569, 1983.
• Haruechaiyasak & Damrongrat (2008) Choochart Haruechaiyasak and Chaianun Damrongrat. Article recommendation based on a topic model for wikipedia selection for schools. In International Conference on Asian Digital Libraries, pp. 339–342. Springer, 2008.
• Holland et al. (1983) Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
• Hubert & Arabie (1985) Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of classification, 2(1):193–218, 1985.
• Joseph et al. (2016) Antony Joseph, Bin Yu, et al. Impact of regularization on spectral clustering. The Annals of Statistics, 44(4):1765–1791, 2016.
• Lara (2019) Nathan De Lara. The sparse + low rank trick for matrix factorization-based graph algorithms. In Proceedings of the 15th International Workshop on Mining and Learning with Graphs (MLG), 2019.
• Le et al. (2017) Can M Le, Elizaveta Levina, and Roman Vershynin. Concentration and regularization of random graphs. Random Structures & Algorithms, 51(3):538–561, 2017.
• Lei et al. (2015) Jing Lei, Alessandro Rinaldo, et al. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015.
• Luxburg (2007) Ulrike Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, December 2007. ISSN 0960-3174.
• Newman (2006) Mark EJ Newman. Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23):8577–8582, 2006.
• Ng et al. (2002) Andrew Y Ng, Michael I Jordan, and Yair Weiss.

On spectral clustering: Analysis and an algorithm.

In Advances in neural information processing systems, pp. 849–856, 2002.
• Pedregosa et al. (2011) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
• Qin & Rohe (2013) Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochastic blockmodel. In Advances in Neural Information Processing Systems, pp. 3120–3128, 2013.
• Rosenberg & Hirschberg (2007) Andrew Rosenberg and Julia Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In

Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)

, pp. 410–420, 2007.
• Snell & Doyle (2000) P Snell and Peter Doyle. Random walks and electric networks. Free Software Foundation, 2000.
• Spielman (2007) Daniel A Spielman. Spectral graph theory and its applications. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on, pp. 29–38. IEEE, 2007.
• Vinh et al. (2010) Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct):2837–2854, 2010.
• Zhang & Rohe (2018) Yilin Zhang and Karl Rohe. Understanding regularized spectral clustering via graph conductance. In Advances in Neural Information Processing Systems, pp. 10631–10640, 2018.

## Appendix

We provide of proof of Theorem 2 as well as a complete set of experimental results.

## Appendix A Regularization of Bipartite Graphs

The proof of Theorem 2 follows the same workflow as that of Theorem 1. Let and be the left and right membership matrices for the block matrix . The aggregated matrix is . The diagonal matrices of block sizes are and . We have the equivalent of Proposition 1:

###### Proposition 2.

Let

be a solution to the generalized singular value problem:

 {Bx2=σD1x1BTx1=σD2x2

Then either and or and where is a solution to the generalized singular value problem:

 {¯By2=σ¯D1y1,¯BTy1=σ¯D2y2.
###### Proof.

Since the rank of is equal to , there are pairs of singular vectors associated with the singular values , each satisfying and . By orthogonality, the other pairs of singular vectors satisfy and for some vectors . By replacing these in the original generalized singular value problem, we get that is a solution to the generalized singular value problem for the aggregate graph. ∎

In the following, we focus on the block model described in Section 4, where .

Proof of Lemma 4. The generalized eigenvalue problem (6) associated with the regularized matrix is equivalent to the generalized SVD of the regularized biadjacency matrix :

 {Bαx2=σDα,1x1BTαx1=σDα,2x2,

with .

In view of Proposition 2, the singular value has multiplicity , meaning that the eigenvalue has multiplicity . Since the graph is connected, the eigenvalue 0 has multiplicity 1. The proof then follows from the observation that if is a pair of singular vectors for the singular value , then the vectors are eigenvectors for the eigenvalues .

Proof of Lemma 5. By Proposition 2, we can focus on the generalized singular value problem for the aggregate graph:

 {¯Bαy2=σ¯Dα,1y1¯BTαy1=σ¯Dα,2y2,

Since

 ¯Bα=W1(IK+αJK)W2,

and

 {¯Dα,1=W1(W2+αnI),¯Dα,2=W2(W1+αmI),

we have:

Observing that and , we get:

 {(W2+αmIK)y1σ−W2y2∝1K,(W1+αnIK)y2σ−W1y1∝1K.

As two diagonal matrices commute, we obtain:

 {(W1+αnIK)(W2+αmIK)y1σ−W1W2y1=(η1(W1+αnIK)+η2W2)1K,(W1+αnIK)(W2+αmIK)y2σ−W1W2y2=(η1W1+η2(W2+αmIK))1K,

for some constants , and

 ⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩y1,j=η1(nj+αn)+η2mj(nj+αn)(mj+αm)σ−njmj,y2,j=η1nj+η2(mj+αm)(nj+αn)(mj+αm)σ−njmj.

Letting and , we get:

 sign(y1,j)=s1⟺sign(y2,j)=s2⟺njmj(nj+αn)(mj+αm)≥σ=1−λ,

and the result follows from the fact that and .

Proof of Lemma 6. The proof is the same as that of Lemma 3, where the threshold values follow from Lemma 5:

 μj=1−njmj(nj+αn)(mj+αm).

Proof of Theorem 2. Let be the -th column of the matrix , for some . In view of Lemma 6, this is the eigenvector associated with eigenvalue . In view of Lemma 4, all entries of corresponding to blocks of size have the same sign, the other having the opposite sign.

## Appendix B Experimental Results

In this section, we present more extensive experimental results.

Tables 4 and 5 present results for the same experiment as in Table 2 but for different values of , namely (bisection of the graph) and (half of the ground-truth value). As for , regularization generally improves clustering performance. However, the optimal value of remains both dataset dependent and metric dependent. Note that, for the NG and WS datasets, the clustering remains trivial in the case , one cluster containing all the nodes, until a certain amount of regularization.

Table 6 presents the different scores for both types of regularization on the NG dataset. As we can see, preserving the bipartite structure of the graph leads to slightly better performance.

Finally, Table 7 shows the impact of regularization in the presence of noise for the NG dataset. The conclusions are similar as for the WS dataset: regularization makes the spectral embedding much more robust to noise.