# On the Minimax Misclassification Ratio of Hypergraph Community Detection

Community detection in hypergraphs is explored. Under a generative hypergraph model called "d-wise hypergraph stochastic block model" (d-hSBM) which naturally extends the Stochastic Block Model from graphs to d-uniform hypergraphs, the asymptotic minimax mismatch ratio is characterized. For proving the achievability, we propose a two-step polynomial time algorithm that achieves the fundamental limit. The first step of the algorithm is a hypergraph spectral clustering method which achieves partial recovery to a certain precision level. The second step is a local refinement method which leverages the underlying probabilistic model along with parameter estimation from the outcome of the first step. To characterize the asymptotic performance of the proposed algorithm, we first derive a sufficient condition for attaining weak consistency in the hypergraph spectral clustering step. Then, under the guarantee of weak consistency in the first step, we upper bound the worst-case risk attained in the local refinement step by an exponentially decaying function of the size of the hypergraph and characterize the decaying rate. For proving the converse, the lower bound of the minimax mismatch ratio is set by finding a smaller parameter space which contains the most dominant error events, inspired by the analysis in the achievability part. It turns out that the minimax mismatch ratio decays exponentially fast to zero as the number of nodes tends to infinity, and the rate function is a weighted combination of several divergence terms, each of which is the Renyi divergence of order 1/2 between two Bernoulli's. The Bernoulli's involved in the characterization of the rate function are those governing the random instantiation of hyperedges in d-hSBM. Experimental results on synthetic data validate our theoretical finding that the refinement step is critical in achieving the optimal statistical limit.

## Authors

• 3 publications
• 3 publications
• 8 publications
11/04/2021

### Community detection in censored hypergraph

Community detection refers to the problem of clustering the nodes of a n...
12/29/2018

### Non-Asymptotic Chernoff Lower Bound and Its Application to Community Detection in Stochastic Block Model

Chernoff coefficient is an upper bound of Bayes error probability in cla...
12/22/2021

### Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model

We consider the community detection problem in sparse random hypergraphs...
12/16/2018

### Higher-Order Spectral Clustering under Superimposed Stochastic Block Model

Higher-order motif structures and multi-vertex interactions are becoming...
05/11/2021

### Exact Recovery in the General Hypergraph Stochastic Block Model

This paper investigates fundamental limits of exact recovery in the gene...
05/23/2018

### Hypergraph Spectral Clustering in the Weighted Stochastic Block Model

Spectral clustering is a celebrated algorithm that partitions objects ba...
03/23/2020

### Hypergraph Clustering in the Weighted Stochastic Block Model via Convex Relaxation of Truncated MLE

We study hypergraph clustering under the weighted d-uniform hypergraph s...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Community detection (clustering) has received great attention recently across many applications, including social science, biology, computer science, and machine learning, while it is usually an ill-posed problem due to the lack of ground truth. A prevalent way to circumvent the difficulty is to formulate it as an inverse problem on a graph

, where each node is assigned a community (label) that serves as the ground truth. The ground-truth community assignment is hidden while the graph is revealed. Each edge in the graph models a certain kind of pairwise interaction between the two nodes. The goal of community detection is to determine from , by leveraging the fact that different combination of community relations leads to different likeliness of edge connectivity. When the graph is passively observed, community detection can be viewed as a statistical estimation problem, where the community assignment is to be estimated from a statistical experiment governed by a generative model of random graphs. A canonical generative model is the stochastic block model () [3] (also known as planted partition model [4]) which generates randomly connected edges from a set of labeled nodes. The presence of the edges is governed by

independent Bernoulli random variables, and the parameter of each of them depends on the community assignments of the two nodes in the corresponding edge.

Through the lens of statistical decision theory, the fundamental statistical limits of community detection provides a way to benchmark various community detection algorithms. Under , the fundamental statistical limits have been characterized recently. One line of work takes a Bayesian perspective, where the unknown labeling of nodes in is assumed to be distributed according to certain prior, and one of the most common assumption is i.i.d. over nodes. Along this line, the fundamental limit for exact recovery is characterized [5] in the full generality, while partial recovery remains open in general. See the survey [6] for more details and references therein. A second line of work takes a minimax perspective, and the goal is to characterize the minimax risk, which is typically the mismatch ratio between the true community assignment and the recovered one. In [7], a tight asymptotic characterization of the minimax mismatch ratio for community detection in is found. Along with these theoretical results, several algorithms have been proposed to achieve these limits, including degree-profiling comparison [8] for exact recovery, spectral MLE [9] for almost-exact recovery, and a two-step mechanism [10] under the minimax framework.

However, graphs can only capture pairwise relational information, while such dyadic measure may be inadequate in many applications, such as the task of 3-D subspace clustering [11]

and the higher-order graph matching problem in computer vision

[12]. Moreover, in a co-authorship network such as the DBLP bibliography database where collaboration between scholars usually comes in a group-wise fashion, it seems more appropriate to represent the co-writing relationship in a single collective way rather than inscribing down each pairwise interaction [13]. Therefore, it is natural to model such beyond-pairwise interaction by a hyperedge in a hypergraph and study the clustering problem in a hypergraph setting [14]. Hypergraph partitioning has been investigated in computer science, and several algorithms have been proposed, including spectral methods based on clique expansion [15], hypergraph Laplacian [16], game-theoretic approaches [17]

, tensor method

[18][19], to name a few. Existing approaches, however, mainly focus on optimizing a certain score function entirely based on the connectivity of the observed hypergraph and do not view it as a statistical estimation problem.

In this paper, we investigate the community detection problem in hypergraphs through the lens of statistical decision theory. Our goal is to characterize the fundamental statistical limit and develop computationally feasible algorithms to achieve it. As for the generative model for hypergraphs, one natural extension of the model to a hypergraph setting is the hypergraph stochastic block model (), where the presence of an order- hyperedge (i.e. , the maximum edge cardinality) is governed by a Bernoulli random variable with parameter and the presence of different hyperedges are mutually independent. Despite the success of the aforementioned algorithms applied on many practical datasets, it remains open how they perform in since the the fundamental limits have not been characterized and the probabilistic nature of has not been fully utilized.

As a first step towards characterizing the fundamental limit of community detection in hypergraphs, in this work we focus on the “-wise hypergraph stochastic block model” (-), where all the hyperedges generated in the hypergraph stochastic block model are of order . Our main contributions are summarized as follows.

• First, we characterize of the asymptotic minimax mismatch ratio in - for any order .

• Second, we propose a polynomial time algorithm which provably achieves the minimax mismatch ratio in the asymptotic regime, under mild regularity conditions.

To the best of our knowledge, this is the first result which characterizes the fundamental limit on the minimax risk of community detection in random hypergraphs, together with a companion efficient recovery algorithm. The proposed algorithm consists of two steps. The first step is a global estimator that roughly recovers the hidden community assignment to a certain precision level, and the second step refines the estimated assignment based on the underlying probabilistic model.

It is shown that the minimax mismatch ratio in - converges to zero exponentially fast as , the size of the hypergraph, tends to infinity. The rate function, which is the exponent normalized by

, turns out to be a linear combination of Rényi divergences of order 1/2. Each divergence term in the sum corresponds to a pair of community relations that would be confused with one another when there is only one misclassification, and the weighted coefficient associated with it indicates the total number of such confusing patterns. Probabilistically, there may well be two or more misclassifications, with each confusing relation pair pertaining to a Rényi divergence when analyzing the error probability. However, we demonstrate technically that these situations are all dominated by the error event with a single misclassified node, which leaves out only the “neighboring” divergence terms in the asymptotic expression. The main technical challenge resolved in this work is attributed to the fact that the community relations become much more complicated as the order

increases, meaning that more error events may arise compared to the dichotomy situation (i.e. same-community and different-community) in the graph case. In the proof of achievability, we show that the second refinement step is able to achieve the fundamental limit provided that the first initialization step satisfies a certain weak consistency condition. The core of the second-step algorithm lies in a local version of the maximum likelihood estimation, where concentration inequalities are utilized to upper bound the probability of error. Here, an additional regularity condition is required to ensure that the probability parameters, which corresponds to the appearance of various types of hyperedge, do not deviate too much from each other. We would like to note that this constraint can be relaxed as long as the number of communities considered does not scale with . For the first step, we use the tools from perturbation theory such as the Davis-Kahan theorem to prove the performance of the proposed spectral clustering algorithm. Since entries in the derived hypergraph Laplacian matrix are no longer independent, a union bound is applied here to make the analysis tractable with the concentration inequalities. The converse part of the minimax mismatch ratio follows a standard approach in statistics by finding a smaller parameter space where we can analyze the risk. We first lower bound the minimax risk by the Bayesian risk with a uniform prior. Then, the Bayesian risk is transformed to a local one by exploring the closed-under-permutation property of the targeted parameter space. Finally, we identify the local Bayesian risk with the risk function of a hypothesis testing problem and apply the Rozovsky lower bound in the large deviation theory to obtain the desired converse result.

### Related Works

The hypergraph stochastic block model is first introduced in [20] as the planted partition model in random uniform hypergraphs where each hyperedge has the same cardinality. The uniform assumption is later relaxed in a follow-up work [21] and a more general with mixing edge-orders is considered. In [22], the authors consider the sparse regime and propose a spectral method based on a generalization of non-backtracking operator. Besides, a weak consistency condition is derived in [21] for by using the hypergraph Laplacian. Departing from , an extension to the censored block model to the hypergraph setting is considered in [23], where an information theoretic limit on the sample complexity for exact recovery is characterized. As for the proposed two-step algorithm, the refine-after-initialize concept has also been used in graph clustering [8, 9, 10] and ranking [24].

This paper generalizes our previous works in the two conference papers [1, 2] in three ways. First, [1] only explores the extension from the graph to - case where the observed hyperedges are -uniform, as compared to a more general - model for any order analyzed in [2] and here. In addition, the number of communities is allowed to be scaled with the number of vertices in this work, rather than being a constant as assumed in [2]. This slight relaxation actually leads to another regularization condition imposed on the connecting probabilities, which is an non-trivial technical extension. Finally, we also demonstrate that our proposed algorithms: the hypergraph spectral clustering algorithm and the local refinement scheme are able to achieve the partial recovery and the exact recovery criteria, respectively.

The rest of the paper is organized as follows. We first introduce the random hypergraph model - and formulate the community detection problem in Section II. Previous efforts on the minimax result in graph and - are refreshed in Section III, which motivates the key quantities that characterize the fundamental limit in -. The main contribution of this work, the characterization of the optimal minimax mismatch ratio under - for any general , is presented in Section IV. We propose two algorithms in Section V along with an analysis on the time complexity. Theoretical guarantees for the proposed algorithms as well as the technical proofs are given in Section VI, while the converse part of the main theorem is established in Section VII. In Section VIII, we implement the proposed algorithms on synthetic data and present some experimental results. The paper is concluded with a few discussions on the extendability of the two-step algorithm and the minimax result to a weighted - setting in Section IX.

### Notations

• Let denote the cardinality of the set and for .

• is the symmetric group of degree , which contains all the permutations from to itself.

• The function

represents the community label vector associated with a labeling function

for a node vector .

• For any community assignment and permutation , denotes the permuted assignment vector.

• The asymptotic equality between two functions and , denoted as (as ), holds if .

• Also, is to mean that and are in the same order if for some constant independent of . , defined by , means that is asymtotically smaller than . is equivalent to . These notations are equivalent to the standard Big O notations , , and , which we also use in this paper interchangeably.

• , is the and norm for a vector , respectively.

• is the Hamming distance between two vectors and .

• For a matrix , we denote its operator norm by and its Frobenius norm by .

• For a -dimensional tensor , we denote its -th element by , where , and we write .

• Finally, let for be the set of all orthogonal matrices.

## Ii Problem Formulation

### Ii-a Community Relations

Before introducing the random hypergaph model -, we first describe the community relations among nodes, which serves as the basic building block of our model. Let be the set of all possible community relations under - and denotes the total number of them. In contrast to the dichotomy situation (same community or not) concerning the appearance of an edge between two nodes in the usual symmetric , there is a growing number of community relations in - as the order increases. In order not to mess up with them, we use the idea of majorization [25] to organize with each in the form of a histogram. Specifically, the histogram operator is used to transform a vector into its histogram vector . For convinience, we sort the histogram vector in descending order and append zero’s if necessary to make a length- vector. The notion of majorization is introduced as follows. For any , we say that majorizes , written as , if for and , where ’s are elements of sorted in descending order. Observe that each community relation in can be uniquely represented, when sorted in descending order, by a -dimensional histogram vector . We arrange the elements in in majorization (pre)order such that if and only if . For example, is relation all-same with the most concentrated histogram and is the only-1-different relation with . Likewise, is the only-2-same relation with and the last one in , relation all-different , has a histogram vector being the all-one vector.

###### Example 2.1 (K4 in 4-hSBM):

with histogram vectors being

Relation Histogram Connecting Probability
all-same
only-1-different
only-2-same
all-different

### Ii-B Random Hypergraph Model: d-hSBM

In a -uniform hypergraph, the adjacency relation among the nodes in can be equivalently represented by a -dimensional random tensor (the size of each dimension being ), where is the access index of an element in the tensor. The following two natural conditions on this adjacency tensor come from the basic properties of an undirected hypergraph:

For each , is a Bernoulli random variable with success probability . The parameter tensor depends only on the community assignments of the associated nodes in the hyperedge and forms a block structure. The block structure is characterized by a symmetric community connection -dimensional tensor where .

To setup the parameter space considered in our statistical study, below we first introduce some further notations. Let be the size of the -th community for . Besides, let where denotes the success probability of the Bernoulli random variable that corresponds to the appearance of a hyperedge with relation . We make a natural assumption that . The more concentrated a group is, the higher the chances that the members will be connected by an hyperedge.

###### Remark 2.1:

We would like to note that there is nothing peculiar about the assumption that ’s are in decreasing order and the condition can be relaxed. All that is required are that the connecting probabilities ’s are well separated and the difference between each are within the same order. See Section VI for a more formal statement of our main result.

The parameter space considered here is a homogeneous and approximately equal-sized case where each . Formally speaking (let ),

 (1)

where has the property that if and only if . In other words, only the histogram of the community labels within a group matters when it comes to connectivity. is a parameter that controls how much could vary. We assume the more interesting case that where the community sizes are not restricted to be exactly equal. Interchangeably, we would write to indicate the community relation within nodes under the assignment . Throughout the paper, we will assume that the order of the observed hypergraph is a constant, while the other parameters, including the total number of communities and the hyperedge connection probability , can be coupled with . Specifically, can either be a constant or it can also scale with . Moreover, as pointed out in [1], the regime where the hypergraph is weakly recoverable could be orderly lower than the one considered in of graphs [8]. To guarantee the solvability of weak recovery in -, we set the probability parameter should at least be in the order of . Therefore, we would write where for all . We would like to note that the probability regime considered here is first motivated in [1]. Under -, the authors in [1] consider for the probability parameter, which is orderly lower than the one () required for partial recovery in [8] and the minimax risk in [7] for graph . The motivation is that, since the total number of random variables in a random -uniform hypergraph is roughly -times larger than those in a traditional random graph, the underlying hypergraph is allowed to be -times sparser and still retain a risk of the same order. In light of this, we relax the probability parameter from to in -.

### Ii-C Performance Measure

To evaluate how good an estimator is, we use the mismatch ratio

as the performance measure to the community detection problem. The un-permuted loss function is defined as

 ℓ0(σ1,σ2)≜1ndH(σ1,σ2)

where is the Hamming distance. It directly counts the proportion of misclassified nodes between an estimator and the ground truth assignment. Concerning the issue of possible re-labeling, the mismatch ratio is defined as the loss function which maximizes the agreements between an estimator and the ground truth after an alignment by label permutation.

 ℓ(ˆσ,σ)≜minπ∈Skℓ0(ˆσπ,σ) (2)

As convention, we use to denote the corresponding risk function. Finally, the minimax risk for the parameter space under - is denoted as

###### Remark 2.2:

Notice that in a symmetric (homogeneous) [6], the connectivity tensor is uniquely determined by the labeling function . Therefore, we would drop the subscript in and write when it comes to the uncertainty arising from the random hypergraph model with the underlying assignment being . Similarly, we would write instead of for ease of notation.

## Iii Prior Works

For the case , the asymptotic minimax risk is characterized in [7], which decays to zero exponentially fast as . In addition, the (negative) exponent of is determined by the Rényi divergence of order between two Bernoulli distributions and

 Ipq≜−2log(√pq+√1−p√1−q) (3)

where is the success probability of a same-community edge while stands for a different-community one. Extending from traditional graph to a hypergraph setting, the authors in [1] generalize the minimax result obtained in [7] to the - model as follows

 −logR∗3≍(n′2)Ipq+(k−2)(n′)2Iqr

where the probability parameter corresponds to the community relations with histograms , and , respectively. Observe that the exponent of the minimax risk in - does not depend on the divergence term explicitly. That is, consists of only those neighboring divergence terms whose histogram vectors have a distance of . Besides, associated with each divergence term is a weighted coefficient, i.e. for and for . These coefficients appears in the hypothesis testing problem when deriving the lower bound of the minimax result. Essentially, they represent the total number of random variables that appear either as a relation- hyperedge or as a relation- hyperedge when the community label of this targeted node is being tested.

It turns out that the optimal minimax risk in - also decays to zero exponentially fast, given that the outcome of the initialization algorithm satisfies a certain condition. The exponent, as stated formally later, is a weighted combination of divergence terms. To specify the weight in this weighted average, we introduce further notations below. We use

to denote the collection of ordered pairs of relations in

that are neighbors to each other. Second, there is a combinatorial number associated with every pairwise divergence term. Precisely, let us consider a least favorable sub-parameter space of .

 ΘLd(n,k,p,η)≜{(B,σ)∈Θ0d∣nt∈{n′−1,n′,n′+1}∀t∈[k], nσ(1)=n′+1} (4)

In , each community takes on only three possible sizes. In addition, there are exactly members in the community where the first node belongs. We pick a in and construct a new assignment based on :

 σ[σ0](i)={argmin2≤t≤k{nt=n′}, for i=1σ0(i), for 2≤i≤n.

In other words, assignments and only disagree on the label of the first node. For each pair in , we define the weighted coefficient

 mrirj≜∣∣{l=(1,l2,…,ld)∣lσ0∼ri,lσ[σ0]∼rj}∣∣

as the number of relation- hyperedges that we mistake as relation- hyperedges. Note that the above definition is independent to the choice of due to the community size constraints.

###### Example 3.1 (N4 in 4-hSBM):

with elements

Relation Pair Weighted Coefficient

Note that is the smallest while is the largest.

## Iv Main Contribution

The optimal minimax risk for the homogeneous and approximately equal-sized parameter space under the probabilistic model - is characterized as follows.

###### Main Theorem:

Suppose and . If

 ∑i

and

 ∑i

as . Then

 logR∗d≍−∑i

If is a constant, then (7) holds without the further assumption (6).

###### Remark 4.1:

In this work, we assume that the order of the hypergraph is a constant. More generally, one may also wonder how the characterized minimax risk changes when this order is also allowed to scale with . Certainly, the expression above for the optimal minimax risk depends on the hypergraph order , yet only implicitly. To obtain an explicit form of in terms of , we have to get further estimates of those weighted coefficient ’s as well as the corresponding Rényi divergence term ’s. The latter can be estimated by when as assumed in the main theorem. On the other hand, as commented in Example 3.1, it is not hard to see that achieves its minimum at between with and with while it attains its maximum at between with and with . Therefore, when the differences are constant , the last term with dominates other terms in the summation seeing that the parameter is coupled with . In particular, the error exponent for the optimal minimax risk in equation (7) is in the order of . Surprisingly, the minimax risk would decay more slowly due to the factorial term in the denominator as the order increases. However, we would like to note that this observation is valid only under the assumption that the considered hypergraphs are in the sparse regime where there are roughly hyperedges generated no matter how large the order is.

The minimax risk is provably achieved, through Theorem 6.1 in Section VI, by the proposed two-step algorithm. Roughly speaking, we first demonstrate that the second-step refinement is capable of obataining an acurate parameter estimation as long as the first-step initialization satisfies a weak consistency condition. Then, the local MLE step is proved to achieve a mismatch ratio as the desired minimax risk, with which the local majority voting could recover the true community label for each node with the guaranteed risk. Finally, we show that our proposed spectral clustering algorithm with the hypergraph Laplacian matrix are qualified as a first-step initialization algorithm. We will compare our theoretical findings to those for the graph case [10] below as well as for the hypergraph setting [21] later in Section VI. On the other hand, the converse part is established through Theorem 7.1 in Section VII.

### Iv-a Implications to Exact Recovery

Since we consider a minimax framework, the theoretical guarantees of our two-step algorithm are also sufficient to ensure the partial recovery and the exact recovery as considered under the Bayesian perspective [8]. Before presenting the theorems in regard to the community recovery in the Bayesian case, let’s first refresh on these two recovery notions. The definitions of different recovery criterions discussed here can be found in the comprehensive survey [6]. We paraphrase them below for completeness. Please refer to the survey for more details and the references therein. In terms of the mismatch ratio (2),

###### Definition 4.1 (Revised Definition 4 in [6]):

Consider a and a corresponding random hypergraph . The following recovery requirements are solved if there exists an algorithm which takes as input and estimates such taht

• Partial Recovery:

• Exact Recovery:

where the probability is taken over the random realizations of and the asymptotic notation is with respect to the growth of .

Our proposed two-step algorithm in Section V can provably satisfy the exact recovery criterion.

###### Theorem 4.1:

If

 liminfn→∞∑i1, (8)

then Algorithm LABEL:alg:refine combined with Algorithm LABEL:alg:spec_init is able to solve the exact recovery problem.

###### Proof.

With (8), for any there exists a small constant such that

 ∑i1+c.

By the Markov inequality, we have

 P{ℓ(ˆσ,σ)<1n} ≤n⋅E[ℓ(ˆσ,σ)] ≤nR∗d

Note that the event that mismatch ratio is smaller than is equivalent to the event that it is identical to . ∎

We would like to note that partial recovery is immediate from the exact recovery. Indeed, the required condition can be relaxed from (8) and depends on the extent of distortion .

### Iv-B Comparison with [10]

We can recover the minimax result obtained in [10] by specializing in the main theorem above. In [10], the authors consider the traditional model under a homogeneous parameter space with connecting probability being . We would like to note that the parameter space considered in [10] is more general in the sense that it needs not be nearly equal-sized case. To be more specific, the size of each community is allowed to vary within (where ). However, the parameter controlling this variation is itself only restricted in the range for some technical issue, which makes the attempted relaxation on the community size less interesting. In light of this, we compare only the minimax result in [10] with , which is in our notation. The overall result for , when combining the spectral initialization step with the local refinement step proposed therein (denoted as ), can be summarized as follows: Suppose and . If

 (a−b)2ak3logk→∞ (9)

as . Then, there exists a sequence such that

 sup(B,σ)∈Θ02Pσ{ℓ(ˆσGM,σ)≥exp(−(1−ζn)n′Ipq)}→0. (10)

Indeed, conditiion (9) required is exactly the same as (5) by using the approximation . Note that there is only one community relation pair in and the weighted coefficient is . In fact, the situation is very simple in since there are only two possible community relations, i.e. intra-community (relation all-same) and inter-community (relation all-diff). On the contrary, the relational information gets more and more complicated as increases. This inevitable “curse of dimension” is reflected in the second assumption (6) we made in the main theorem. First, recall that we set all of the probability parameters in the same order as the condition required in [10]. Apart from that, we also need to make sure that the differences associated with the pairs remain in the same order to successfully upper-bound the error probability in the proof of achievablity. Under the traditional , it is not hard to see that the assumption (6) is weaker than the assumption (5). Therefore, the overall requirement is equivalent to (9) made in [10] without any further assumption.

## V Proposed Algorithms

In this section, we propose our main algorithm for community detection in random hypergraphs, which is later in Section VI proved to achieve the minimax risk in the - asymptotically. The algorithm (Algorithm LABEL:alg:refine) comprises two major steps. In the first step, for each , it generate an estimated assignment of all nodes except by applying an initialization algorithm on the sub-hypergraph without the vertex . For example, we can apply the hypergraph clustering method described in Subsection V-B on , the sub-tensor of when the -th coordinate is removed in each dimension. Then, in the second step, the label of under is determined by maximizing a local likelihood function described in Subsection V-A. Note that the parameters of the underlying - need not be known in advance, as it could conduct a parameter estimation before computing the local likelihood function if necessary. Finally, with estimated assignments , the algorithm combines all of them together and forms a consensus via majority neighbor voting.

### V-a Refinement Scheme

Let us begin with the global likelihood function defined as follows. Let

denote the log-likelihood of an adjacency tensor when the hidden community structure is determined by . For each , we use

to denote those likelihood terms in (11) pertaining to the -th node when its label is . It is not hard to see that is a sum of independent Bernoulli random variables. However, is not independent of for any since those random hyperedges that might enclose vertex and vertex simultaneously appear in both of the summands of the likelihood terms. The global likelihood function and the local likelihood function is related by

 L(σ;A)=1d∑u∈[n]Lu(σ,t;A).

This is because each likelihood term in (11) is counted exactly times when summing over all possible equation (12)’s. For each node , based on the estimated assignment of the other nodes, we use the following local MLE method to predict the label of .

 ˆσu(u)≜argmaxt∈[k]Lu(σ,t;A)

When the connectivity tensor that governs the underlying random hypergraph model - is unknown when evaluating the likelihood, we will use and to denote the global and local likelihood function with the true replaced by its estimated counterpart . Since the presence of each edge is independent based on our probabilistic model, we use the sample mean to estimate the real parameters. Note that the superscript is to indicate the fact that the estimation is calculated with node taken out. Finally, consensus is drawn by using the majority neighbor voting. In fact, the consensus step looks for a consensus assignment for the possible different community assignments obtained in the local MLE method in Algorithm LABEL:alg:refine. Since all these assignments will be close to the ground truth up to some permutation, this step combines all of them to conclude a single community assignment as the final output.

algocf[htbp]

### V-B Spectral Initialization

In order to devise a good initialization algorithm , we develop a hypergraph version of the unnormalized spectral clustering [26] with regularization [27]. In particular, a modified version of the hypergraph Laplacian described below is employed. Let be the incidence matrix, where each entry is the indicator function whether or not node belongs to the hyperedge . Note that the incidence matrix contains the same amount of information as the the adjacency tensor . Let

 du≜∑e∈EHue

denote the degree of the -th node, and

 ¯d≜1n∑u∈[n]du

be the average degree across the hypergraph. The unnormalized hypergraph Laplacian is defined as

 L(A)≜HHT−D (15)

where is a diagonal matrix representing the degree distribution in the hypergraph with adjacency tensor and is the usual matrix transpose. Note that can be thought of as an encoding of the higher-dimensional connectivity relationship into a two-dimensional matrix.

Before we directly apply the spectral method, high-degree abnormals in the tensor is first trimmed to ensure the performance of the clustering algorithm. Specifically, we use to denote the modification of where all coordinates pertaining to the set are replaced with all-zero vectors. Let and be the corresponding incidence matrix and degree matrix of , respectively. The spectrum we are looking for is the trimmed version of , denoted as

 Tτ(L(A))≜HτHTτ−Dτ (16)

where the operator represents the trimming process with a degree threshold . We use

 SVDk(Tτ(L(A)))≜ˆU=[uT1⋯uTn]T∈Rn×k

to denote the

leading singular vectors generated from the singular value decomposition of the trimmed matrix

. Note that in a conventional spetral clustering algorithm, each node is represented by a reduced -dimensional row vector . The spectral clustering algorithm is described in Algorithm LABEL:alg:spec_init.

algocf[htbp]

Similar to classical spectral clustering, we make use of the row vectors of to cluster nodes. In each loop, we first choose the node which covers the most nodes with radius in to be the clustering center. Then, we assign all nodes whose distance from this center is smaller than to this cluster. At the end of the loop, we remove all nodes within this cluster from . The final cleanup step (LABEL:eq:spec_init_cls) in the algorithm is to assign those nodes that deviate too much from all clusters. It assigns each remaining node to the cluster between which it has the minimum average distance.

###### Remark 5.1:

It is noteworthy that Algorithm LABEL:alg:spec_init is just one method which is eligible to serve as a qualified first-step estimator . As mentioned above, the minimax risk is asymptotically achievable with Algorithm LABEL:alg:refine

as long as the initialization algorithm does not mis-classify too many nodes. The weak consistency requirement is stated explicitly in

Section VI when theoretical guarantees are discussed.

### V-C Time Complexity

Algorithm LABEL:alg:spec_init has a time complexity of , the bottleneck of which being the step. Still, the computation of could be done approximately in time with high probability [9] if we are only interested in the first spectrums. As for the refinement scheme, the sparsity of the underlying hypergraph can be utilized to reduce the complextiy since the whole network structure could be stored in the incidence matrix equivalently as in the -dimensional adjacency tensor . As a result, the parameter estimation stage only requires where is the total number of hyperedges realized. Similary, the time complexity would be and for the calculation of likelihood function and the consensus step, respectively. Hence, the overall complexity for Algorithm LABEL:alg:refine and Algorithm LABEL:alg:spec_init combined are for a constant order . It further reduces to in the sparse regime where with high probability.

###### Remark 5.2:

It is possible to simplify our algorithm in the same way as in [10], where the SVD is done only once. The time complexity of the simplified version of our algorithm will be in the sparse regime. This is comparable to the any other state-of-art min-cut algorithm, which usually exhibit time complexity at least . Although we are not able to provide any theoretical guarantee for this simplified version, as in [10], empirically it seems to have the same performance as the original algorithm. Proving its asymptotic optimality is left as future work.

## Vi Theoretical Guarantees

Combining the first-step and second-step algorithm in Section V, we have the following overall performance guarantee which serves as the achievability part of the Main Theorem in Section IV.

###### Theorem 6.1:

Suppose and . If (5) and (6) holds as . Then, the combined estimator (Algorithm LABEL:alg:refine) along with estimator (Algorithm LABEL:alg:spec_init) is able to achieve a risk of

 sup(B,σ)∈Θ0dRσ(ˆσ2)≤exp(−(1+ζn)∑i

for some vanishing sequence . If is a constant, then (17) holds without the further assumption (6).

In what follows, we first state the theoretical guarantees of Algorithm LABEL:alg:refine as well as Algorithm LABEL:alg:spec_init and demonstrate how they in combination aggregate to the upper bound result. The detailed proofs of the intermediate theorems are established later in Subsection VI-A and Subsection VI-B, respectively.

The algorithm proposed in Section V consists of two steps. We first get a rough estimation through the first step, which is a spectral clustering on the hypergraph Laplacian matrix. After that, for each node we perform a local maximum likelihood estimation, which serves as the second step, to further adjust its community assignment. It turns out that this refining mechanism is actually crucial in achieving the optimal minimax risk , as long as the first initialization step satisfies a certain weak consistency condition. Specifically, the first-step algorithm should meet the requirement stated below.

###### Condition 6.1:

There exists constant , and a positive sequence such that

 inf(B,σ)∈Θ0dPσ{ℓ(ˆσ0,σ)≤γn}≥1−C0n−(1+δ) (18)

for sufficiently large .

We have the following performance guarantee for our second-step algorithm.

###### Theorem 6.2:

If

 ∑i

and Condition 6.1 is satisfied for

 γ=o(1klogk) (20)

and

 γ=o(1kmax(i,j):(ri,rj)∈Ndp1−pκdpi−pj). (21)

Then, the estimator (Algorithm LABEL:alg:refine) is able to achieve a risk of

 sup(B,σ)∈Θ0dRσ(ˆσ2)≤exp(−(1+ζ′n)∑i

for some vanishing sequence . If is a constant, then (22) holds without further assuming (21).

As for the initialization algorithm, first recall that we assume the connecting probabilities are in the same order. Also, we use to denote the -th largest singular value of the matrix , which is the expectation of the hypergraph Laplacian (15). Note that each entry in the matrix is a weighted combination of the probability parameters ’s. Stated in terms of , the following theorem characterizes the mismatch ratio of the first-step algorithm that we propose.

###### Theorem 6.3:

If

 ka1λ2k≤C1 (23)

for some sufficiently small where . Apply Algorithm LABEL:alg:spec_init with a sufficiently small constant and for some sufficiently large constant . For any constant , there exists some depending only on , and so that

 ℓ(ˆσ,σ)≤Ca1λ2k

with probability at least .

To take out the dependency on , we use the observation below.

###### Lemma 6.1:

For - in , we have

 λk≳∑i
###### Proof of Theorem 6.1.

Finally, we only need to prove that the result of Theorem 6.3 does match Condition 6.1. Combining Theorem 6.3 with Lemma 6.1, we have

 ℓ(ˆσ,