I Introduction
Community detection (clustering) has received great attention recently across many applications, including social science, biology, computer science, and machine learning, while it is usually an illposed problem due to the lack of ground truth. A prevalent way to circumvent the difficulty is to formulate it as an inverse problem on a graph
, where each node is assigned a community (label) that serves as the ground truth. The groundtruth community assignment is hidden while the graph is revealed. Each edge in the graph models a certain kind of pairwise interaction between the two nodes. The goal of community detection is to determine from , by leveraging the fact that different combination of community relations leads to different likeliness of edge connectivity. When the graph is passively observed, community detection can be viewed as a statistical estimation problem, where the community assignment is to be estimated from a statistical experiment governed by a generative model of random graphs. A canonical generative model is the stochastic block model () [3] (also known as planted partition model [4]) which generates randomly connected edges from a set of labeled nodes. The presence of the edges is governed byindependent Bernoulli random variables, and the parameter of each of them depends on the community assignments of the two nodes in the corresponding edge.
Through the lens of statistical decision theory, the fundamental statistical limits of community detection provides a way to benchmark various community detection algorithms. Under , the fundamental statistical limits have been characterized recently. One line of work takes a Bayesian perspective, where the unknown labeling of nodes in is assumed to be distributed according to certain prior, and one of the most common assumption is i.i.d. over nodes. Along this line, the fundamental limit for exact recovery is characterized [5] in the full generality, while partial recovery remains open in general. See the survey [6] for more details and references therein. A second line of work takes a minimax perspective, and the goal is to characterize the minimax risk, which is typically the mismatch ratio between the true community assignment and the recovered one. In [7], a tight asymptotic characterization of the minimax mismatch ratio for community detection in is found. Along with these theoretical results, several algorithms have been proposed to achieve these limits, including degreeprofiling comparison [8] for exact recovery, spectral MLE [9] for almostexact recovery, and a twostep mechanism [10] under the minimax framework.
However, graphs can only capture pairwise relational information, while such dyadic measure may be inadequate in many applications, such as the task of 3D subspace clustering [11]
and the higherorder graph matching problem in computer vision
[12]. Moreover, in a coauthorship network such as the DBLP bibliography database where collaboration between scholars usually comes in a groupwise fashion, it seems more appropriate to represent the cowriting relationship in a single collective way rather than inscribing down each pairwise interaction [13]. Therefore, it is natural to model such beyondpairwise interaction by a hyperedge in a hypergraph and study the clustering problem in a hypergraph setting [14]. Hypergraph partitioning has been investigated in computer science, and several algorithms have been proposed, including spectral methods based on clique expansion [15], hypergraph Laplacian [16], gametheoretic approaches [17], tensor method
[18][19], to name a few. Existing approaches, however, mainly focus on optimizing a certain score function entirely based on the connectivity of the observed hypergraph and do not view it as a statistical estimation problem.In this paper, we investigate the community detection problem in hypergraphs through the lens of statistical decision theory. Our goal is to characterize the fundamental statistical limit and develop computationally feasible algorithms to achieve it. As for the generative model for hypergraphs, one natural extension of the model to a hypergraph setting is the hypergraph stochastic block model (), where the presence of an order hyperedge (i.e. , the maximum edge cardinality) is governed by a Bernoulli random variable with parameter and the presence of different hyperedges are mutually independent. Despite the success of the aforementioned algorithms applied on many practical datasets, it remains open how they perform in since the the fundamental limits have not been characterized and the probabilistic nature of has not been fully utilized.
As a first step towards characterizing the fundamental limit of community detection in hypergraphs, in this work we focus on the “wise hypergraph stochastic block model” (), where all the hyperedges generated in the hypergraph stochastic block model are of order . Our main contributions are summarized as follows.

First, we characterize of the asymptotic minimax mismatch ratio in  for any order .

Second, we propose a polynomial time algorithm which provably achieves the minimax mismatch ratio in the asymptotic regime, under mild regularity conditions.
To the best of our knowledge, this is the first result which characterizes the fundamental limit on the minimax risk of community detection in random hypergraphs, together with a companion efficient recovery algorithm. The proposed algorithm consists of two steps. The first step is a global estimator that roughly recovers the hidden community assignment to a certain precision level, and the second step refines the estimated assignment based on the underlying probabilistic model.
It is shown that the minimax mismatch ratio in  converges to zero exponentially fast as , the size of the hypergraph, tends to infinity. The rate function, which is the exponent normalized by
, turns out to be a linear combination of Rényi divergences of order 1/2. Each divergence term in the sum corresponds to a pair of community relations that would be confused with one another when there is only one misclassification, and the weighted coefficient associated with it indicates the total number of such confusing patterns. Probabilistically, there may well be two or more misclassifications, with each confusing relation pair pertaining to a Rényi divergence when analyzing the error probability. However, we demonstrate technically that these situations are all dominated by the error event with a single misclassified node, which leaves out only the “neighboring” divergence terms in the asymptotic expression. The main technical challenge resolved in this work is attributed to the fact that the community relations become much more complicated as the order
increases, meaning that more error events may arise compared to the dichotomy situation (i.e. samecommunity and differentcommunity) in the graph case. In the proof of achievability, we show that the second refinement step is able to achieve the fundamental limit provided that the first initialization step satisfies a certain weak consistency condition. The core of the secondstep algorithm lies in a local version of the maximum likelihood estimation, where concentration inequalities are utilized to upper bound the probability of error. Here, an additional regularity condition is required to ensure that the probability parameters, which corresponds to the appearance of various types of hyperedge, do not deviate too much from each other. We would like to note that this constraint can be relaxed as long as the number of communities considered does not scale with . For the first step, we use the tools from perturbation theory such as the DavisKahan theorem to prove the performance of the proposed spectral clustering algorithm. Since entries in the derived hypergraph Laplacian matrix are no longer independent, a union bound is applied here to make the analysis tractable with the concentration inequalities. The converse part of the minimax mismatch ratio follows a standard approach in statistics by finding a smaller parameter space where we can analyze the risk. We first lower bound the minimax risk by the Bayesian risk with a uniform prior. Then, the Bayesian risk is transformed to a local one by exploring the closedunderpermutation property of the targeted parameter space. Finally, we identify the local Bayesian risk with the risk function of a hypothesis testing problem and apply the Rozovsky lower bound in the large deviation theory to obtain the desired converse result.Related Works
The hypergraph stochastic block model is first introduced in [20] as the planted partition model in random uniform hypergraphs where each hyperedge has the same cardinality. The uniform assumption is later relaxed in a followup work [21] and a more general with mixing edgeorders is considered. In [22], the authors consider the sparse regime and propose a spectral method based on a generalization of nonbacktracking operator. Besides, a weak consistency condition is derived in [21] for by using the hypergraph Laplacian. Departing from , an extension to the censored block model to the hypergraph setting is considered in [23], where an information theoretic limit on the sample complexity for exact recovery is characterized. As for the proposed twostep algorithm, the refineafterinitialize concept has also been used in graph clustering [8, 9, 10] and ranking [24].
This paper generalizes our previous works in the two conference papers [1, 2] in three ways. First, [1] only explores the extension from the graph to  case where the observed hyperedges are uniform, as compared to a more general  model for any order analyzed in [2] and here. In addition, the number of communities is allowed to be scaled with the number of vertices in this work, rather than being a constant as assumed in [2]. This slight relaxation actually leads to another regularization condition imposed on the connecting probabilities, which is an nontrivial technical extension. Finally, we also demonstrate that our proposed algorithms: the hypergraph spectral clustering algorithm and the local refinement scheme are able to achieve the partial recovery and the exact recovery criteria, respectively.
The rest of the paper is organized as follows. We first introduce the random hypergraph model  and formulate the community detection problem in Section II. Previous efforts on the minimax result in graph and  are refreshed in Section III, which motivates the key quantities that characterize the fundamental limit in . The main contribution of this work, the characterization of the optimal minimax mismatch ratio under  for any general , is presented in Section IV. We propose two algorithms in Section V along with an analysis on the time complexity. Theoretical guarantees for the proposed algorithms as well as the technical proofs are given in Section VI, while the converse part of the main theorem is established in Section VII. In Section VIII, we implement the proposed algorithms on synthetic data and present some experimental results. The paper is concluded with a few discussions on the extendability of the twostep algorithm and the minimax result to a weighted  setting in Section IX.
Notations

Let denote the cardinality of the set and for .

is the symmetric group of degree , which contains all the permutations from to itself.

The function
represents the community label vector associated with a labeling function
for a node vector . 
For any community assignment and permutation , denotes the permuted assignment vector.

The asymptotic equality between two functions and , denoted as (as ), holds if .

Also, is to mean that and are in the same order if for some constant independent of . , defined by , means that is asymtotically smaller than . is equivalent to . These notations are equivalent to the standard Big O notations , , and , which we also use in this paper interchangeably.

, is the and norm for a vector , respectively.

is the Hamming distance between two vectors and .

For a matrix , we denote its operator norm by and its Frobenius norm by .

For a dimensional tensor , we denote its th element by , where , and we write .

Finally, let for be the set of all orthogonal matrices.
Ii Problem Formulation
Iia Community Relations
Before introducing the random hypergaph model , we first describe the community relations among nodes, which serves as the basic building block of our model. Let be the set of all possible community relations under  and denotes the total number of them. In contrast to the dichotomy situation (same community or not) concerning the appearance of an edge between two nodes in the usual symmetric , there is a growing number of community relations in  as the order increases. In order not to mess up with them, we use the idea of majorization [25] to organize with each in the form of a histogram. Specifically, the histogram operator is used to transform a vector into its histogram vector . For convinience, we sort the histogram vector in descending order and append zero’s if necessary to make a length vector. The notion of majorization is introduced as follows. For any , we say that majorizes , written as , if for and , where ’s are elements of sorted in descending order. Observe that each community relation in can be uniquely represented, when sorted in descending order, by a dimensional histogram vector . We arrange the elements in in majorization (pre)order such that if and only if . For example, is relation allsame with the most concentrated histogram and is the only1different relation with . Likewise, is the only2same relation with and the last one in , relation alldifferent , has a histogram vector being the allone vector.
Example 2.1 ( in ):
with histogram vectors being
Relation  Histogram  Connecting Probability  

allsame  
only1different  
only2same  
alldifferent 
IiB Random Hypergraph Model: 
In a uniform hypergraph, the adjacency relation among the nodes in can be equivalently represented by a dimensional random tensor (the size of each dimension being ), where is the access index of an element in the tensor. The following two natural conditions on this adjacency tensor come from the basic properties of an undirected hypergraph:
For each , is a Bernoulli random variable with success probability . The parameter tensor depends only on the community assignments of the associated nodes in the hyperedge and forms a block structure. The block structure is characterized by a symmetric community connection dimensional tensor where .
To setup the parameter space considered in our statistical study, below we first introduce some further notations. Let be the size of the th community for . Besides, let where denotes the success probability of the Bernoulli random variable that corresponds to the appearance of a hyperedge with relation . We make a natural assumption that . The more concentrated a group is, the higher the chances that the members will be connected by an hyperedge.
Remark 2.1:
We would like to note that there is nothing peculiar about the assumption that ’s are in decreasing order and the condition can be relaxed. All that is required are that the connecting probabilities ’s are well separated and the difference between each are within the same order. See Section VI for a more formal statement of our main result.
The parameter space considered here is a homogeneous and approximately equalsized case where each . Formally speaking (let ),
(1) 
where has the property that if and only if . In other words, only the histogram of the community labels within a group matters when it comes to connectivity. is a parameter that controls how much could vary. We assume the more interesting case that where the community sizes are not restricted to be exactly equal. Interchangeably, we would write to indicate the community relation within nodes under the assignment . Throughout the paper, we will assume that the order of the observed hypergraph is a constant, while the other parameters, including the total number of communities and the hyperedge connection probability , can be coupled with . Specifically, can either be a constant or it can also scale with . Moreover, as pointed out in [1], the regime where the hypergraph is weakly recoverable could be orderly lower than the one considered in of graphs [8]. To guarantee the solvability of weak recovery in , we set the probability parameter should at least be in the order of . Therefore, we would write where for all . We would like to note that the probability regime considered here is first motivated in [1]. Under , the authors in [1] consider for the probability parameter, which is orderly lower than the one () required for partial recovery in [8] and the minimax risk in [7] for graph . The motivation is that, since the total number of random variables in a random uniform hypergraph is roughly times larger than those in a traditional random graph, the underlying hypergraph is allowed to be times sparser and still retain a risk of the same order. In light of this, we relax the probability parameter from to in .
IiC Performance Measure
To evaluate how good an estimator is, we use the mismatch ratio
as the performance measure to the community detection problem. The unpermuted loss function is defined as
where is the Hamming distance. It directly counts the proportion of misclassified nodes between an estimator and the ground truth assignment. Concerning the issue of possible relabeling, the mismatch ratio is defined as the loss function which maximizes the agreements between an estimator and the ground truth after an alignment by label permutation.
(2) 
As convention, we use to denote the corresponding risk function. Finally, the minimax risk for the parameter space under  is denoted as
Remark 2.2:
Notice that in a symmetric (homogeneous) [6], the connectivity tensor is uniquely determined by the labeling function . Therefore, we would drop the subscript in and write when it comes to the uncertainty arising from the random hypergraph model with the underlying assignment being . Similarly, we would write instead of for ease of notation.
Iii Prior Works
For the case , the asymptotic minimax risk is characterized in [7], which decays to zero exponentially fast as . In addition, the (negative) exponent of is determined by the Rényi divergence of order between two Bernoulli distributions and
(3) 
where is the success probability of a samecommunity edge while stands for a differentcommunity one. Extending from traditional graph to a hypergraph setting, the authors in [1] generalize the minimax result obtained in [7] to the  model as follows
where the probability parameter corresponds to the community relations with histograms , and , respectively. Observe that the exponent of the minimax risk in  does not depend on the divergence term explicitly. That is, consists of only those neighboring divergence terms whose histogram vectors have a distance of . Besides, associated with each divergence term is a weighted coefficient, i.e. for and for . These coefficients appears in the hypothesis testing problem when deriving the lower bound of the minimax result. Essentially, they represent the total number of random variables that appear either as a relation hyperedge or as a relation hyperedge when the community label of this targeted node is being tested.
It turns out that the optimal minimax risk in  also decays to zero exponentially fast, given that the outcome of the initialization algorithm satisfies a certain condition. The exponent, as stated formally later, is a weighted combination of divergence terms. To specify the weight in this weighted average, we introduce further notations below. We use
to denote the collection of ordered pairs of relations in
that are neighbors to each other. Second, there is a combinatorial number associated with every pairwise divergence term. Precisely, let us consider a least favorable subparameter space of .(4) 
In , each community takes on only three possible sizes. In addition, there are exactly members in the community where the first node belongs. We pick a in and construct a new assignment based on :
In other words, assignments and only disagree on the label of the first node. For each pair in , we define the weighted coefficient
as the number of relation hyperedges that we mistake as relation hyperedges. Note that the above definition is independent to the choice of due to the community size constraints.
Example 3.1 ( in ):
with elements
Relation Pair  Weighted Coefficient 

Note that is the smallest while is the largest.
Iv Main Contribution
The optimal minimax risk for the homogeneous and approximately equalsized parameter space under the probabilistic model  is characterized as follows.
Main Theorem:
Remark 4.1:
In this work, we assume that the order of the hypergraph is a constant. More generally, one may also wonder how the characterized minimax risk changes when this order is also allowed to scale with . Certainly, the expression above for the optimal minimax risk depends on the hypergraph order , yet only implicitly. To obtain an explicit form of in terms of , we have to get further estimates of those weighted coefficient ’s as well as the corresponding Rényi divergence term ’s. The latter can be estimated by when as assumed in the main theorem. On the other hand, as commented in Example 3.1, it is not hard to see that achieves its minimum at between with and with while it attains its maximum at between with and with . Therefore, when the differences are constant , the last term with dominates other terms in the summation seeing that the parameter is coupled with . In particular, the error exponent for the optimal minimax risk in equation (7) is in the order of . Surprisingly, the minimax risk would decay more slowly due to the factorial term in the denominator as the order increases. However, we would like to note that this observation is valid only under the assumption that the considered hypergraphs are in the sparse regime where there are roughly hyperedges generated no matter how large the order is.
The minimax risk is provably achieved, through Theorem 6.1 in Section VI, by the proposed twostep algorithm. Roughly speaking, we first demonstrate that the secondstep refinement is capable of obataining an acurate parameter estimation as long as the firststep initialization satisfies a weak consistency condition. Then, the local MLE step is proved to achieve a mismatch ratio as the desired minimax risk, with which the local majority voting could recover the true community label for each node with the guaranteed risk. Finally, we show that our proposed spectral clustering algorithm with the hypergraph Laplacian matrix are qualified as a firststep initialization algorithm. We will compare our theoretical findings to those for the graph case [10] below as well as for the hypergraph setting [21] later in Section VI. On the other hand, the converse part is established through Theorem 7.1 in Section VII.
Iva Implications to Exact Recovery
Since we consider a minimax framework, the theoretical guarantees of our twostep algorithm are also sufficient to ensure the partial recovery and the exact recovery as considered under the Bayesian perspective [8]. Before presenting the theorems in regard to the community recovery in the Bayesian case, let’s first refresh on these two recovery notions. The definitions of different recovery criterions discussed here can be found in the comprehensive survey [6]. We paraphrase them below for completeness. Please refer to the survey for more details and the references therein. In terms of the mismatch ratio (2),
Definition 4.1 (Revised Definition 4 in [6]):
Consider a and a corresponding random hypergraph . The following recovery requirements are solved if there exists an algorithm which takes as input and estimates such taht

Partial Recovery:

Exact Recovery:
where the probability is taken over the random realizations of and the asymptotic notation is with respect to the growth of .
Our proposed twostep algorithm in Section V can provably satisfy the exact recovery criterion.
Theorem 4.1:
If
(8) 
then Algorithm LABEL:alg:refine combined with Algorithm LABEL:alg:spec_init is able to solve the exact recovery problem.
Proof.
With (8), for any there exists a small constant such that
By the Markov inequality, we have
Note that the event that mismatch ratio is smaller than is equivalent to the event that it is identical to . ∎
We would like to note that partial recovery is immediate from the exact recovery. Indeed, the required condition can be relaxed from (8) and depends on the extent of distortion .
IvB Comparison with [10]
We can recover the minimax result obtained in [10] by specializing in the main theorem above. In [10], the authors consider the traditional model under a homogeneous parameter space with connecting probability being . We would like to note that the parameter space considered in [10] is more general in the sense that it needs not be nearly equalsized case. To be more specific, the size of each community is allowed to vary within (where ). However, the parameter controlling this variation is itself only restricted in the range for some technical issue, which makes the attempted relaxation on the community size less interesting. In light of this, we compare only the minimax result in [10] with , which is in our notation. The overall result for , when combining the spectral initialization step with the local refinement step proposed therein (denoted as ), can be summarized as follows: Suppose and . If
(9) 
as . Then, there exists a sequence such that
(10) 
Indeed, conditiion (9) required is exactly the same as (5) by using the approximation . Note that there is only one community relation pair in and the weighted coefficient is . In fact, the situation is very simple in since there are only two possible community relations, i.e. intracommunity (relation allsame) and intercommunity (relation alldiff). On the contrary, the relational information gets more and more complicated as increases. This inevitable “curse of dimension” is reflected in the second assumption (6) we made in the main theorem. First, recall that we set all of the probability parameters in the same order as the condition required in [10]. Apart from that, we also need to make sure that the differences associated with the pairs remain in the same order to successfully upperbound the error probability in the proof of achievablity. Under the traditional , it is not hard to see that the assumption (6) is weaker than the assumption (5). Therefore, the overall requirement is equivalent to (9) made in [10] without any further assumption.
V Proposed Algorithms
In this section, we propose our main algorithm for community detection in random hypergraphs, which is later in Section VI proved to achieve the minimax risk in the  asymptotically. The algorithm (Algorithm LABEL:alg:refine) comprises two major steps. In the first step, for each , it generate an estimated assignment of all nodes except by applying an initialization algorithm on the subhypergraph without the vertex . For example, we can apply the hypergraph clustering method described in Subsection VB on , the subtensor of when the th coordinate is removed in each dimension. Then, in the second step, the label of under is determined by maximizing a local likelihood function described in Subsection VA. Note that the parameters of the underlying  need not be known in advance, as it could conduct a parameter estimation before computing the local likelihood function if necessary. Finally, with estimated assignments , the algorithm combines all of them together and forms a consensus via majority neighbor voting.
Va Refinement Scheme
Let us begin with the global likelihood function defined as follows. Let
(11) 
denote the loglikelihood of an adjacency tensor when the hidden community structure is determined by . For each , we use
(12) 
to denote those likelihood terms in (11) pertaining to the th node when its label is . It is not hard to see that is a sum of independent Bernoulli random variables. However, is not independent of for any since those random hyperedges that might enclose vertex and vertex simultaneously appear in both of the summands of the likelihood terms. The global likelihood function and the local likelihood function is related by
This is because each likelihood term in (11) is counted exactly times when summing over all possible equation (12)’s. For each node , based on the estimated assignment of the other nodes, we use the following local MLE method to predict the label of .
When the connectivity tensor that governs the underlying random hypergraph model  is unknown when evaluating the likelihood, we will use and to denote the global and local likelihood function with the true replaced by its estimated counterpart . Since the presence of each edge is independent based on our probabilistic model, we use the sample mean to estimate the real parameters. Note that the superscript is to indicate the fact that the estimation is calculated with node taken out. Finally, consensus is drawn by using the majority neighbor voting. In fact, the consensus step looks for a consensus assignment for the possible different community assignments obtained in the local MLE method in Algorithm LABEL:alg:refine. Since all these assignments will be close to the ground truth up to some permutation, this step combines all of them to conclude a single community assignment as the final output.
algocf[htbp]
VB Spectral Initialization
In order to devise a good initialization algorithm , we develop a hypergraph version of the unnormalized spectral clustering [26] with regularization [27]. In particular, a modified version of the hypergraph Laplacian described below is employed. Let be the incidence matrix, where each entry is the indicator function whether or not node belongs to the hyperedge . Note that the incidence matrix contains the same amount of information as the the adjacency tensor . Let
denote the degree of the th node, and
be the average degree across the hypergraph. The unnormalized hypergraph Laplacian is defined as
(15) 
where is a diagonal matrix representing the degree distribution in the hypergraph with adjacency tensor and is the usual matrix transpose. Note that can be thought of as an encoding of the higherdimensional connectivity relationship into a twodimensional matrix.
Before we directly apply the spectral method, highdegree abnormals in the tensor is first trimmed to ensure the performance of the clustering algorithm. Specifically, we use to denote the modification of where all coordinates pertaining to the set are replaced with allzero vectors. Let and be the corresponding incidence matrix and degree matrix of , respectively. The spectrum we are looking for is the trimmed version of , denoted as
(16) 
where the operator represents the trimming process with a degree threshold . We use
to denote the
leading singular vectors generated from the singular value decomposition of the trimmed matrix
. Note that in a conventional spetral clustering algorithm, each node is represented by a reduced dimensional row vector . The spectral clustering algorithm is described in Algorithm LABEL:alg:spec_init.algocf[htbp]
Similar to classical spectral clustering, we make use of the row vectors of to cluster nodes. In each loop, we first choose the node which covers the most nodes with radius in to be the clustering center. Then, we assign all nodes whose distance from this center is smaller than to this cluster. At the end of the loop, we remove all nodes within this cluster from . The final cleanup step (LABEL:eq:spec_init_cls) in the algorithm is to assign those nodes that deviate too much from all clusters. It assigns each remaining node to the cluster between which it has the minimum average distance.
Remark 5.1:
It is noteworthy that Algorithm LABEL:alg:spec_init is just one method which is eligible to serve as a qualified firststep estimator . As mentioned above, the minimax risk is asymptotically achievable with Algorithm LABEL:alg:refine
as long as the initialization algorithm does not misclassify too many nodes. The weak consistency requirement is stated explicitly in
Section VI when theoretical guarantees are discussed.VC Time Complexity
Algorithm LABEL:alg:spec_init has a time complexity of , the bottleneck of which being the step. Still, the computation of could be done approximately in time with high probability [9] if we are only interested in the first spectrums. As for the refinement scheme, the sparsity of the underlying hypergraph can be utilized to reduce the complextiy since the whole network structure could be stored in the incidence matrix equivalently as in the dimensional adjacency tensor . As a result, the parameter estimation stage only requires where is the total number of hyperedges realized. Similary, the time complexity would be and for the calculation of likelihood function and the consensus step, respectively. Hence, the overall complexity for Algorithm LABEL:alg:refine and Algorithm LABEL:alg:spec_init combined are for a constant order . It further reduces to in the sparse regime where with high probability.
Remark 5.2:
It is possible to simplify our algorithm in the same way as in [10], where the SVD is done only once. The time complexity of the simplified version of our algorithm will be in the sparse regime. This is comparable to the any other stateofart mincut algorithm, which usually exhibit time complexity at least . Although we are not able to provide any theoretical guarantee for this simplified version, as in [10], empirically it seems to have the same performance as the original algorithm. Proving its asymptotic optimality is left as future work.
Vi Theoretical Guarantees
Combining the firststep and secondstep algorithm in Section V, we have the following overall performance guarantee which serves as the achievability part of the Main Theorem in Section IV.
Theorem 6.1:
In what follows, we first state the theoretical guarantees of Algorithm LABEL:alg:refine as well as Algorithm LABEL:alg:spec_init and demonstrate how they in combination aggregate to the upper bound result. The detailed proofs of the intermediate theorems are established later in Subsection VIA and Subsection VIB, respectively.
The algorithm proposed in Section V consists of two steps. We first get a rough estimation through the first step, which is a spectral clustering on the hypergraph Laplacian matrix. After that, for each node we perform a local maximum likelihood estimation, which serves as the second step, to further adjust its community assignment. It turns out that this refining mechanism is actually crucial in achieving the optimal minimax risk , as long as the first initialization step satisfies a certain weak consistency condition. Specifically, the firststep algorithm should meet the requirement stated below.
Condition 6.1:
There exists constant , and a positive sequence such that
(18) 
for sufficiently large .
We have the following performance guarantee for our secondstep algorithm.
Theorem 6.2:
As for the initialization algorithm, first recall that we assume the connecting probabilities are in the same order. Also, we use to denote the th largest singular value of the matrix , which is the expectation of the hypergraph Laplacian (15). Note that each entry in the matrix is a weighted combination of the probability parameters ’s. Stated in terms of , the following theorem characterizes the mismatch ratio of the firststep algorithm that we propose.
Theorem 6.3:
If
(23) 
for some sufficiently small where . Apply Algorithm LABEL:alg:spec_init with a sufficiently small constant and for some sufficiently large constant . For any constant , there exists some depending only on , and so that
with probability at least .
To take out the dependency on , we use the observation below.
Lemma 6.1:
For  in , we have
(24) 
Comments
There are no comments yet.