The problem of clustering is prevalent in a variety of applications such as social network analysis, computer vision, and computational biology. Among many clustering algorithms, spectral clustering is one of the most prominent algorithms proposed by  in the context of image segmentation, viewing an image as a graph of pixel nodes, connected by weighted edges representing visual similarities between two adjacent pixel nodes. This approach has become popular, showing its wide applicability in numerous applications, and has been extensively analyzed under various models [3, 4, 5] .
While the standard spectral clustering relies upon interactions between pairs of two nodes, there are many applications where interaction occurs across more than two nodes. One such application includes a social network with online social communities, called folksonomies, in which users attach tags to resources. In the example, a three-way interaction occurs across users, resources and annotations . Another application is molecular biology, in which multi-way interactions between distinct systems capture molecular interactions . See  and the list of applications therein. Hence, one natural follow-up research direction is to extend the celebrated framework of graph spectral clustering into a hypergraph setting in which edges reflect multi-way interactions.
As an effort, in this work, we consider a random weighted uniform hypergraph model which we call the weighted stochastic block model, which is a special case of that considered in . An edge of size is homogeneous if it consists of nodes from the same group, and is heterogeneous otherwise.111While edges of a graph are pairs of nodes, edges of a hypergraph (or hyperedges) are arbitrary sets of nodes. Further, the size of an edge is the number of nodes contained in the edge. Given a hidden partition of nodes into groups, a weight is independently assigned to each edge of size such that homogeneous edges tend to have higher weights than heterogeneous edges. More precisely, for some constants , the expectation of homogeneous edges’ weights is and that of heterogeneous edges’ weights is .222For illustrative purpose, we focus on a symmetric setting. In Sec. V, we will extend our results (to be described later) to a more general setting. Here, captures the sparsity level of the weights, which may decay in . The task here is to recover the hidden partition from the weighted hypergraph. In particular, we aim to develop computationally efficient algorithms that provably find the hidden partition.
Our contributions: By generalizing the spectral clustering algorithms proposed for the graph clustering, we first propose two poly-time algorithms which we name Hypergraph Spectral Clustering (HSC) and Hypergraph Spectral Clustering with Local Refinement (HSCLR). We then analyze their performances, assuming that the size of hyperedges is , the number of clusters is constant, and the size of each group is linear in . Our main results can be summarized as follows. For some constants and , which depend only on , , and
, the following statements hold with high probability:
Detection: If , the output of HSC is more consistent with the hidden partition than a random guess;
Weak consistency: If , HSC outputs a partition which coincides with the hidden partition except number of nodes; and
Strong consistency: If , HSCLR exactly recovers the hidden partition.
We remark that our main results are the first order-wise optimal results for the binary edge weight case (see Proposition 1).
|Model assumption||Order of required for|
I-a Related work
I-A1 Graph Clustering
The problem of standard graph clustering, i.e., , has been studied in great generality. Here, we summarize some major developments, referring the readers to a recent survey by Abbe  for details. The detection problem, whose goal is to find a partition that is more consistent with the hidden partition than a random guess, has received a wide attention. A notable work by Decelle et al. 
firstly observes phase transition and conjectures the transition limit. Further, they also conjecture that the computational gap exists for the case of. For the case of , the phase transition limit is fully settled jointly by  and [16, 17]: The impossibility of the detection below the conjectured threshold is established in , and it is proved that the conjectured threshold can be achieved via some efficient algorithms in [16, 17]. The limits for the case have been studied in [18, 19, 20, 21], and are settled in .
The weak/strong consistency problem aims at finding a cluster that is correct except a vanishing or zero fraction. The necessary and sufficient conditions for weak consistency have been studied in [23, 24, 25, 26, 27], and those for strong consistency in [28, 29, 25, 27]. In particular for strong consistency, both the fundamental limits and computationally efficient algorithms are investigated initially for [28, 29, 25], and recently for general . While most of the works assume that the graph parameters such as , , , and the size of clusters are fixed, one can also study the minimax scenario where the graph parameters are adversarially chosen against the clustering algorithm. In , the authors characterize the minimax-optimal rate. Further,  shows that the minimax-optimal rate can be achieved by an efficient algorithm.
I-A2 Hypergraph Clustering
Compared to graph clustering, the study of hypergraph clustering is still in its infancy. In this section, we briefly summarize recent developments. For detection, analogous to the work by Decelle et al. , Angelini et al.  firstly conjecture phase transition thresholds. These conjectures have not been settled yet unlike the graph case. In , the authors study a specific spectral clustering algorithm, which can be shown to detect the hidden cluster if , while the conjectured threshold for detection is for some constant . Actually, this gap is due to the technical challenge that is specific to the hypergraph clustering problem: See Remark 7 for details. In , the authors study the bipartite stochastic block model, and as a byproduct of their results, they show that detection is possible under some specific model if . While this guarantee is order-wise optimal, it holds only when edge weights are binary-valued and the size of two clusters are equal. Our detection guarantee, obtained by delicately resolving the technical challenges specific to hypergraphs, is also order-wise optimal but does not require such assumptions.
While several consistency results under various models are shown in [9, 10, 11, 8, 12], to the best of our knowledge, our consistency guarantees are the first order-wise optimal ones. We briefly overview the existing results below. In [9, 10], the authors derive consistency results for the case in which and weights are binary-valued. In , the authors investigate consistency results of a certain spectral clustering algorithm under a fairly general random hypergraph model, called the planted partition model in hypergraphs. Indeed, our hypergraph model is a special case of the planted partition model, and hence the algorithm proposed in  can be applied to our model as well. One can show that their algorithm is weakly consistent if under our model. The case of non-uniform hypergraphs, in which the size of edges may vary, is studied in . See Table I for a summary.
While most of the existing works focus on analyzing the performance of certain clustering algorithms, some study the fundamental limits. In [32, 1], the information-theoretic limits are characterized for specific hypergraph models. In , the minimax optimal rates of error fraction are derived for the binary weighted edge case. However, it has not been clear whether or not a computationally efficient algorithm can achieve such limits. In this work, we show that HSCLR achieves the fundamental limit for the model considered in .
I-A3 Main innovation relative to 
The new algorithms proposed in this work can be viewed as strict improvements over the algorithm proposed in our previous work . First, the algorithm of  cannot handle the sparse-weight regime, i.e., . In order to address this, we employ a preprocessing step prior to the spectral clustering step. It turns out this can handle the sparse regime; see Lemma 2 for details.
Another limitation of the original algorithm is related to its refinement step (to be detailed later). The original refinement step is tailored for a specific model, which assumes binary-valued weights and two clusters (see Definition 7). On the other hand, our new refinement step can be applied to the general case with weighted edges and clusters. Further, the original refinement step involves iterative updates, and this is solely because our old proof holds only with such iterations. However, we observe via experiments that a single refinement step is always sufficient. By integrating a well-known sample splitting technique into our algorithm, we are able to prove that a single refinement step is indeed sufficient.
Apart from the improvements above, we also propose a sketching algorithm for subspace clustering based on our new algorithm, and we show that it outperforms existing schemes in terms of sample complexity as well as computational complexity.
I-A4 Computer vision applications
The weighted stochastic block model that we consider herein is well-fitted into computer vision applications such as geometric grouping and subspace clustering [34, 35, 36]. The goal of such problems is to cluster a union of groups of data points where points in the same group lie on a common low-dimensional affine space. In these applications, similarity between a fixed number of data points reflects how well the points can be approximated by a low-dimensional flat. By viewing these similarities as the weights of edges in a hypergraph, one can relate it to our model. Note that edges connecting the data points from the same low-dimensional affine space have larger weights compared to other edges: See Section VI for detailed discussion.
I-A5 Connection with low-rank tensor completion
Our model bears strong resemblance to the low-rank tensor completion. To see this, consider the following model: for each, edge weight of is generated as (where ) if are from the same cluster; otherwise. This model generates a weighted hypergraph, whose weights are either , or . Now, view each weight as an observation of an entry of a hidden tensor , whose entries if are from the same cluster; otherwise. Here, weight indicates that the entry is “unobserved”. Then, the knowledge of hidden partition will directly lead to “completion” of unobserved entries. This way, one can draw a parallel between hypergraph clustering and the low-rank tensor completion.333Here, is of rank at most since it admits a CP-decomposition  .
This connection allows us to compare our results with the guarantee in the tensor completion literature. For instance, the sufficient condition for vanishing estimation error, i.e., weak consistency, derived in reads , while ours reads . This favors our approach. Moreover, a more interesting implication arises in computational aspects. Notice that a naïve lower bound for tensor completion is444The number of free parameters defining a rank , -th order, -dimensional tensor is , which scales like when and are fixed. , and the tensor completion guarantee comes with an additional factor to the lower bound. Actually this gap has not been closed in the literature, raising a question whether this information-computation gap is fundamental. Interestingly, this gap does not appear in our result, hence hypergraph clustering can shed new light on the computational aspects of tensor completion. Recently, a similar observation has been made independently in  for spike-tensor-related models (see Sec. 4.3. therein).
I-B Paper organization
Sec. II introduces the considered model; in Sec. III, our main results are presented along with some implications; in Sec. IV, we provide the proofs of the main theorems; in Sec. V, we discuss as to how our results can be extended and adapted to other models; Sec. VI is devoted to practical applications relevant to our model, and presents the empirical performances of the proposed algorithms; and in Sec. VII, we conclude the paper with some future research directions.
Let () be the th row (the th column) of matrix . For a positive integer , . For a set and an integer , . Let denote the natural logarithm. Let denote the indicator function. For a function and , .
Ii The weighted stochastic block model
We first remark that our definition of the weighted SBM is a generalization of the original model for graphs [40, 41] to a hypergraph setting. For simplicity, we will focus on the following symmetric assortative model in this paper. In Sec. V, we generalized our results to a broader class of graph models.
Let be the indices of nodes, and be the set of all possible edges of size for a fixed integer . Let be the hidden partition function that maps nodes into groups for a fixed integer . Equivalently, the membership function can be represented in a matrix form , which we call the membership matrix, whose th entry takes if and otherwise. We denote by the size of the th group for , i.e., . Let and . An edge is homogeneous if and heterogeneous otherwise. We now formally define the weighted SBM.
Definition 1 (The weighted SBM).
A random weight is assigned to each edge independently555Our results hold as long as the weights are upper bounded by any fixed positive constant since one can always normalize the edge weights such that they are within . The global upper bound on the edge weights are required for deriving our large deviation results (Lemmas 3 and 5) in the proof.: for homogeneous edges, ; and for heterogeneous edges, .
Note that the weighted SBM does not assume a specific edge weight distribution but only specifies the expected values. For instance, it can capture the case with a single location family distribution with different parameters as well as the case with two completely different weight distributions.
Example 1 (The unweighted hypergraph case).
Example 2 (The weighted hypergraph case).
For homogeneous edges, ; and for heterogeneous edges,
, a uniform distribution on. This model can be seen as an instance of the weighted SBM.
Ii-2 Performance metric
Given and the number of clusters , we intend to recover a hidden partition up to a permutation. Formally, for any estimator , we define the error fraction as , where is the collection of all permutations of . We study three types of consistency guarantees [43, 13].
Definition 2 (Recovery types).
An estimator is
strongly consistent if ;
weakly consistent if in prob.; and
is solving detection if it outputs a partition which is more consistent relative to a random guess.666Here we provide an informal definition for simplicity. See Definition 7 in  for the formal definition.
Iii Main results
Iii-a Hypergraph Spectral Clustering
Hypergraph Spectral Clustering (HSC) is built upon the spectral relaxation technique  and the spectral algorithms [44, 45, 46, 47, 24, 26, 5]. The first step of the algorithm is to compute the processed similarity matrix whose entries represent similarities between pairs. To this end, we first compute the similarity matrix , where if ; if . This is inspired by the spectral relaxation technique in . Next, we zero-out every row and column whose sum is larger than a certain threshold, constructing an output , which we call the processed similarity matrix. We then apply spectral clustering to the processed similarity matrix. That is, we first find the
largest eigenvectorsof , and cluster rows of using the approximate geometric -clustering . Note that HSC is non-parametric, i.e., it does not require the knowledge of model parameters. See Alg. 1 for the detailed procedure.
The zeroing-out procedure, proposed in 
(see Sec. 3 therein), is used to remove outlier rows whose sums are much larger than the average. This is necessary since if such outliers exist, the eigenvector estimate will be biased, and hence the spectral clustering will also fail. Note that this technique is widely adopted in various graph clustering algorithms[45, 49, 26].
Iii-B Hypergraph Spectral Clustering with Local Refinement
Our second algorithm consists of two stages: HSC and local refinement. The HSCLR algorithm is inspired by a similar refinement procedure, which has been proposed for the graph case [28, 27]. The algorithm begins with randomly splitting edges into two sets and . For small , we assign each edge to independently with probability . is the complement of . Then, we run HSC on . Next, we do local refinement with . For and , define to be the set of edges ( ) which connect node with nodes from , i.e., . Then, for each , we update with
That is, the refinement step first measures the fitness of each node with respect to different clusters, and updates the cluster assignment of each node accordingly. Note that HSCLR is also non-parametric. See Alg. 2 for the detailed procedure.
The time complexity of HSCLR is . For each node , the local refinement requires flops, which is bounded by , where is the number of edges containing node . As , the local refinement step can be done within time.
HSCLR is inspired by the recent paradigm of solving non-convex problems, which first approximately estimates the solution, followed by some local refinement. This two-stage approach has been applied to a variety of contexts, including matrix completion [51, 52], phase retrieval [53, 54], robust PCA , community recovery [28, 56], EM-algorithm , and rank aggregation .
Iii-C Theoretical guarantees
Let be the output of . Suppose that . Then, there exist constants (where depends on and ) such that if , then,
w.p. , provided that .
See Sec. IV-A. ∎
We remark a technical challenge that arises in proving Thm. 1 relative to the graph case. Actually, the key step in the proof is to derive the sharp concentration bound on a certain matrix spectral norm (to be detailed later). But the bounding technique employed in the graph case does not carry over to the hypergraph case, as the matrix has strong dependencies across entries. We address this challenge by developing a delicate analysis that carefully handles such dependencies. See Remark 7 in Sec. IV for details.
Corollary 1 (Detection).
Suppose that . There exists a constant depending on and such that HSC solves detection if .
In Thm. 1, when satisfies for sufficiently large . ∎
We compare our algorithm to the one proposed in . To compare, we first note that in the graph case, the threshold for detection  is achieved by new methods based on the non-backtracking operator [59, 17, 22]. In , the spectral analysis based on a plain adjacency matrix is shown to fail, while the one based on the non-backtracking operator succeeds. Recently, it is shown that the non-backtracking based approach can be extended to the hypergraph case, and it is empirically observed to outperform a spectral method that is similar to HSC except the preprocessing step .
Corollary 2 (Weak consistency).
Suppose that . HSC is weakly consistent if .
By (2), . ∎
When specialized to weighted stochastic block model, the weak consistency guarantee of  becomes , which comes with an extra poly-logarithmic factor gap to ours.
The following theorem provides the theoretical guarantee of HSCLR. See Sec. IV-B for the proof.
Theorem 2 (Strong consistency).
Suppose that . Then, HSCLR with sampling rate777We note that can be chosen arbitrarily as long as and . See Sec. IV-B for detail. is strongly consistent provided that for any ,
We remark that Thm. 2 characterizes the performance of our non-parametric algorithm for any hypergraphs with (bounded) real-valued weights. Hence, one may obtain a tighter threshold and a parametric algorithm by focusing on a more specific hypergraph model. For instance, in , Chien et al. derive a tighter bound for the binary weight case. As a concrete example, when and with two equal-sized clusters, the sufficient condition of Thm 4.1 in  reads while that of Thm. 2 reads .
For the binary-valued edge case, there is no estimator which
solves detection when
is weakly consistent when ; and
is strongly consistent when .
If , the fraction of isolated nodes approaches , hence detection is infeasible. In , the authors show that if , there is no connected component of size , implying that weak consistency is infeasible. Lastly,  shows that for some constant is required for connectivity, a necessary condition for strong consistency. ∎
Iv-a Proof of Theorem 1
We first outline the proof. Proposition 2 asserts that spectral clustering finds the exact clustering if is available instead of . We then make use of Lemma 1 to bound the error fraction in terms of . Finally, we derive a new concentration bound for the above spectral norm, and combine it with Lemma 1 to prove the theorem.
Consider two off-diagonal entries and such that and . One can see from the definition that is statistically identical to , so . Hence, by defining a matrix such that , where , for some , one can verify that coincides with except for the diagonal entries. Our model implies that the diagonal entries of are strictly larger than its off-diagonal entries, so is of full rank.
(Lemma 2.1 in ) Consider of full rank and the membership matrix . Let . Then the matrix whose columns are the first eigenvectors of satisfies: whenever ; and are orthogonal whenever . In particular, a clustering algorithm on the rows of will exactly output the hidden partition.
Proposition 2 suggests that spectral clustering successfully finds if is available. We now turn to the case where is available instead of . It is developed in  a general scheme to prove error bounds for spectral clustering under an assumption that -clustering step outputs a “good” solution. To clarify the meaning of “goodness”, we formally describe the -means clustering problem.
Definition 3 (-means clustering problem).
The goal is to cluster the rows of an matrix . Define the cost function of a partition as , where . We say is -approximate if .
We now introduce the general scheme to prove error bounds, formally stated in the following lemma.
We refer to  for the proof. ∎
Thm. 1.2. in  implies that a -approximate solution can be found using the approximate geometric -clustering.888Note that this result holds only for a fixed . Hence, the above lemma implies that one needs to bound in order to analyze the error fraction of the spectral clustering. Our technical contribution lies mainly in deriving such concentration bound, formally stated below.
There exist constants (depending only on ), such that the processed similarity matrix with constant (see Alg. 1) satisfies with probability exceeding , provided that .
See Appendix A. ∎
Note that this lemma holds for a fixed . We now conclude the proof with these lemmas. Let . We first estimate .
for some constant .
By definition, , where . Since the columns of are orthonormal, . One can show that . Hence, . Hence, we calculate . By the definition of ,
Thus, , where . As , each converges to a positive constant, implying that . ∎
By Lemma 2 and the above claim, holds w.p. for . Choosing , completes the proof.
(Technical novelty relative to the graph case): Indeed, proving the sharp concentration of a spectral norm has been a key challenge in the spectral analysis [63, 44]. While most bounds developed hinge upon the independence between entries999For instance, the most studied model, called the Wigner matrix, assumes independence among entries. See  for more details., the matrix in HSC has strong dependencies across entries due to its construction. For instance, the entries and both have a term for any edge of the form , hence sharing many terms.
One approach to handle this dependency is to use matrix Bernstein inequality  on the decomposition , where . See [11, 8]. However, this approach provides a bound which comes with an extra factor relative to the bound in Lemma 2, resulting in a suboptimal consistency guarantee as described in Sec. I-A.
Another approach is a combinatorial method , which counts the number of edges between subsets. The rationale behind this method is as follows. From the definition of the spectral norm, one needs to bound the quantity
for any vector. It turns out that this quantity has a close connection to the number of (hyper)edges between two subsets in a random (hyper)graph. For instance, is precisely the number of edges between and .
Indeed, a technique for estimating the number of hyperedges between two arbitrary subsets is developed in . Using this method, however, one may only obtain a suboptimal guarantee, which is . On the other hand, we show via our analysis that the order-optimal guarantee can be obtained by improving the standard combinatorial method. See Appendix A.
Iv-B Proof of Theorem 2
We first outline the proof. Using the union bound, we show that it is sufficient to prove for all and . We then consider the following events to bound this error probability. The first event is that the average edge weight of the edges between the true community and node is less than a certain threshold, and the other one is that the average edge weight of the edges between the wrong community and node is greater than the certain threshold. We will first show that if the misclassification event occurs, at least one of these two events must occur. Thus, we bound the error probability by bounding those of these two events using Lemma 3 and Lemma 4, respectively.
We consider the boundary case . As , Corollary 2 guarantees that is weakly consistent. Without loss of generality, assume that the identity permutation is equal to . Then, , i.e., at least
fraction of the nodes that are classified as in communityare correctly classified. The second stage of HSCLR refines the output of the first stage , resulting in . By the union bound, we have . Since the total number of summands is , if for all and , then .
By the refinement rule (1), . For any real numbers , holds. By taking complements of both sides, we have . Therefore, by the union bound, holds for any . Applying this bound, we have
We first interpret and . For illustration, assume that coincides with . Under this assumption, observe that is equal to the average edge weight of the homogeneous edges within community . Since the expected value of this term is , one can show that the term vanishes. Similarly, is the average weight of the edges connecting and the other nodes in community . Since these edges are heterogeneous, also vanishes.
Indeed, as is not exactly zero, but an arbitrarily small constant, the above interpretation is not precise. In what follows, we show that and vanish as well for the case.
We begin with bounding . Denote by the set of all homogeneous edges. Recall that edges in , except fraction, are homogeneous, so . By restricting the range of summation, . Note that
’s are not restricted to Bernoulli random variables. By tweaking the proof of conventional large deviation results for Bernoulli variables, we obtain the following:
Let be the sum of mutually independent random variables taking values in . For any , we have and .
See Appendix D-A. ∎
so Lemma 3 with gives
Next we consider . Again, edges in , except fraction, are heterogeneous, so . The following lemma says that the contribution due to the fraction of edges is marginal:
For sufficiently small ,
See Appendix D-B. ∎
Hence, we focus on heterogeneous edges only. Making a similar argument as above, the bound in Lemma 3 becomes
where () follows since .
We have shown that our algorithms can achieve the order-optimal sample complexity for all different recovery guarantees under a symmetric block model. In this section, we show that our main results indeed hold for a broader class of block models. We also show that HSCLR can achieve the sharp recovery threshold for a certain SBM model.
For the graph case , a fairly general model, which subsumes as a special case the asymmetric SBM, has been investigated. Here we extend our model to one such model but in the context of hypergraphs. Specifically, we consider the following asymmetric weighted SBM.
Definition 4 (The asymmetric weighted SBM).
Let be constants such that holds for any homogeneous edge and heterogeneous edge . A random weight is assigned to each edge independently as follows: For each edge , . Notice that this reduces to the condition of in the symmetric setting.
We find that our main results stated in Thm. 1 and 2 readily carry over the above asymmetric setting. The key rationale behind this is that our spectral clustering guarantee hinges only upon the full-rank condition on (see Sec. III-A for the definition). Here, what one can easily verify is that the condition above implies the full-rank condition, and hence our results hold even for the asymmetric setting. The only distinction here is that the constants that appear in the theorems depend now on ’s. Similarly, our technique can cover disassortative SBM in which heterogeneous edges have larger weights than homogeneous edges.
Definition 5 (The symmetric disassortative weighted SBM).
In Definition 1, we assume instead that .
Another prominent instance is the planted clique model.
Definition 6 (The planted clique model).
Fix -subset of nodes (). Consider a random hypergraph in which every -regular edge appears with probability if or otherwise.
In this model, one wishes to detect the hidden subset , which is called the clique. Following a similar analysis with a different notion of error fraction, one can show that the clique can be detected if for some constant , which is consistent with the well-known result for .
Recently, sharp thresholds on the fundamental limits are characterized in the graph case [27, 22, 28, 24, 30]. In contrast, such a tight result has been widely open in the hypergraph case. A notable exception is our companion paper  which studies a special case of the weighted SBM (considered herein), in which weights are binary-valued.
Definition 7 (Generalized Censored Block Model with Homogeneity Measurements ).
Let be a fixed constant. Assume that and denote erasure by . If the edge is homogeneous, w.p. , w.p.