Generative hypergraph clustering: from blockmodels to modularity

01/24/2021 ∙ by Philip S. Chodrow, et al. ∙ cornell university 0

Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. Many graph algorithms for this task are based on variants of the stochastic blockmodel, a random graph with flexible cluster structure. However, there are few models and algorithms for hypergraph clustering. Here, we propose a Poisson degree-corrected hypergraph stochastic blockmodel (DCHSBM), a generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum-likelihood inference in the DCHSBM naturally leads to a clustering objective that generalizes the popular modularity objective for graphs. We derive a general Louvain-type algorithm for this objective, as well as a a faster, specialized "All-Or-Nothing" (AON) variant in which edges are expected to lie fully within clusters. This special case encompasses a recent proposal for modularity in hypergraphs, while also incorporating flexible resolution and edge-size parameters. We show that AON hypergraph Louvain is highly scalable, including as an example an experiment on a synthetic hypergraph of one million nodes. We also demonstrate through synthetic experiments that the detectability regimes for hypergraph community detection differ from methods based on dyadic graph projections. We use our generative model to analyze different patterns of higher-order structure in school contact networks, U.S. congressional bill cosponsorship, U.S. congressional committees, product categories in co-purchasing behavior, and hotel locations from web browsing sessions, finding interpretable higher-order structure. We then study the behavior of our AON hypergraph Louvain algorithm, finding that it is able to recover ground truth clusters in empirical data sets exhibiting the corresponding higher-order structure.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graphs are a fundamental abstraction for complex relational systems throughout the sciences Jackson [2008]; Easley and Kleinberg [2010]; Newman [2010]. A graph represents components of a system by a set of nodes, and interactions or relationships among these components using edges that connect pairs of nodes. Much of the structure in complex data, however, involves higher-order interactions and relationships between more than two entities at once Milo [2002]; Benson et al. [2016, 2018]; Lambiotte et al. [2019]; Battiston et al. [2020]; Torres et al. [2020]. People communicate and collaborate in large or small groups. Consumers often purchase multiple items in shopping trips. Chemical interactions occur between sets of molecules of varying size. Hypergraphs are now a burgeoning paradigm for modeling these and many other systems Li and Milenkovic [2018]; de Arruda et al. [2020]; Sahasrabuddhe et al. [2020]; Veldt et al. [2020b]. A hypergraph still represents the components by a set of nodes, but the edges (often called hyperedges) may connect arbitrary numbers of nodes. A graph is a special case of a hypergraph, in which each edge connects exactly two nodes.

Given a large complex system, a common analytical task is to break down the system into a relatively small number of modules which efficiently summarize the system’s structure Porter et al. [2009]; Benson et al. [2016]; Fortunato and Hric [2016]; Li and Milenkovic [2018]. This task goes under a wide variety of names, such as clustering, partitioning, community detection, and node classification. In each case, the high-level goal is to assign each node in the graph or hypergraph to one of a few discrete modules (clusters, sets, communities, classes, etc.) such that nodes in the same module are more closely related by the system structure than nodes in differing modules. Because many systems are well-represented by hypergraphs, the problem of identifying modules in hypergraph data has a wide range of applications. In computer science and engineering, hypergraph partitioning has been used to map computations to parallel computers in order to minimize communication costs Ballard et al. [2016]; Kabiljo et al. [2017], optimize circuit layouts Karypis et al. [1999], and segment images Agarwal et al. [2005]

. Within machine learning and data science, hypergraph clustering and node prediction are used for semi-supervised learning 

Zhou et al. [2006]; Yadati et al. [2019] and variants of correlation clustering Li et al. [2019]; Amburg et al. [2020]; Veldt et al. [2020c]. In network science, hypergraph clustering and community detection have been used to analyze gene expressions Tian et al. [2009], food webs Li and Milenkovic [2017], and online social networks Neubauer and Obermayer [2009]; Tsourakakis et al. [2017]. For the remainder of this paper, we will usually refer to hypergraph clustering, keeping in mind the diversity of approaches and applications falling under this heading.

A key idea for data clustering is to postulate a probabilistic generative model

. This is a probability distribution over counterfactual realizations of the data, whose unseen parameters include cluster labels for each observation. The problem of assigning nodes to clusters may then be recast as a statistical inference problem. In the context of graph clustering, the most common of these is the stochastic blockmodel (SBM), its variants, and other latent space models 

Hoff et al. [2002]; Airoldi et al. [2008]; Karrer and Newman [2011]; Yang and Leskovec [2013]; Peixoto [2014]; Athreya et al. [2017]

. In the standard SBM, each node belongs to an unobserved cluster, or “block”. The probability of an edge existing between two nodes depends only on the block labels of each 

Nowicki and Snijders [2001]; Karrer and Newman [2011]. In the influential planted partition model Jerrum and Sorkin [1998]; Condon and Karp [2001]

, for example, nodes in the same block connect at higher rates than nodes in distinct blocks. In practice, we typically have access to neither the block labels nor the connection rates, and inference techniques are necessary to estimate each.

While generative modeling is a well-developed mainstay of graph clustering approaches, generative techniques for hypergraph clustering are largely lacking. Indeed, while a small number of generative hypergraph models have been proposed Ghoshdastidar and Dukkipati [2014]; Kim et al. [2018], these models typically generate hypergraphs with edges of only one size. With one recent exception Ke et al. [2019], these models also do not model degree heterogeneity between nodes. Heterogeneity in edge size and node degree are key features of empirical data Benson et al. [2018] that we aim to capture. An alternative to modeling hypergraph data directly is to transform the hypergraph to a dyadic graph. This is most commonly done by clique expansion, where a dyadic edge connects any pair of nodes that appear together in some hyperedge Zhou et al. [2006]; Benson et al. [2016]. While this enables the use of a wide array of existing models and algorithms for graphs, the higher-order structure that we want to capture is lost Chodrow [2020a], and generative models of the resulting dyadic graph that do not take into account the clique projection may rely on explicitly violated independence assumptions.

Here we propose a direct generative approach to hypergraph clustering, based on a model of clustered hypergraphs with heterogeneous degree distributions and hyperedge sizes. Our proposed model generalizes the popular degree-corrected stochastic blockmodel (DCSBM) Karrer and Newman [2011], and is also related to a recent model for hypergraphs with uniform edge sizes and heterogeneous degrees Ke et al. [2019]. In our degree-corrected hypergraph stochastic blockmodel (DCHSBM), each node belongs to a latent cluster. The probability of a hyperedge appearing among a given set of nodes depends on the size of the set, the expected degrees of the nodes, and an affinity function which governs rates of connections between nodes based on their clusters. We outline a practical coordinate-ascent maximum-likelihood estimation scheme for fitting this model to hypergraph data. This scheme alternates between two steps. In the first, we estimate latent parameters governing node degrees and affinities between cluster labels. This step can be performed in closed form, using a single pass over the data. The second step involves a more difficult (in fact, NP-hard Brandes et al. [2007]

) combinatorial optimization problem. We show that this problem directly generalizes the well-studied modularity objective for graphs. Additionally, when

has “All-Or-Nothing” (AON) structure, the resulting modularity objective encompasses recently proposed notion of hypergraph modularity Kamiński et al. [2019] as an important special case. Our derivation additionally shows how to obtain resolution parameters from the cluster affinity function, which significantly enhances the applicability of this modularity objective to empirical data. For instance, these parameters can be used to down-weight the role of large hyperedges in cases (such as email) in which large interactions may provide weak or no evidence of cluster co-membership. Our alternating framework means that these parameters need not be set by the researcher a priori—they can be inferred from data.

In order to perform the second step, we propose an optimization heuristic that generalizes the broadly used Louvain algorithm for modularity maximization in graphs 

Blondel et al. [2008]. This algorithm can be used whenever the affinity function possesses symmetric structure. The considerable flexibility in specifying symmetric affinity functions connects our objective function to modeling ideas used for hypergraph partitioning and multiway cut problems Veldt et al. [2020a]. While the general form of this algorithm is computationally intensive, we derive a specialized algorithm for the AON affinity function that scales well to very large hypergraphs. We conduct synthetic experiments on hypergraphs of size ranging from one thousand to one million nodes, finding that runtime is comparable to fast dyadic methods. In another synthetic experiment, we study the ability of hypergraph AON Louvain to recover planted clusters. We show that hypergraph modularity methods can recover planted clusters in some regimes in which dyadic methods must necessarily fail due to known information-theoretic bounds. This experiment suggests that the theory of detectability in hypergraphs may be substantially more complex than the already-rich theory of detectability in graphs Abbe [2017].

We then study a range of empirical systems, including school contact networks, U.S. congressional bill cosponsorships, U.S. congressional committee memberships, co-purchasing, and online hotel browsing. In several of these experiments, our fast alternating modularity maximization scheme is able to find clusters substantially more correlated with supplied data labels than standard approaches based on dyadic modularity. In others experiments, dyadic methods appear weakly preferable. We show how our generative framework enables us to interpret these differing results: the data sets on which our fast algorithm performs well are the data sets in which the corresponding generative model is favored under a penalized likelihood criterion. These results highlight the importance of making explicit the assumptions of clustering algorithms, and matching these assumptions appropriately to data.

2 Related Work

2.1 Hypergraph Partitioning and Spectral Clustering

The need to separate nodes of a hypergraph into various clusters or communities arises in many settings. Applications in VSLI design and scientific computing typically require clusters that are roughly balanced in size; common tools for this task include multilevel partitioning methods such as hMetis Karypis et al. [1999]; Karypis and Kumar [2000] and KyHyPar Schlag et al. [2016]; Akhremtsev et al. [2017]. Various hypergraph generalizations of spectral graph clustering Chung [1992]

have been considered in the machine learning and theoretical computer science literature. These perform clustering based on the eigenvectors of some notion of a hypergraph Laplacian. Numerous hypergraph generalizations of the graph Laplacian have been considered, including Laplacians based on graph projections 

Bolla [1993]; Li and Solé [1996]; Agarwal et al. [2006]; Zhou et al. [2006]; Benson et al. [2016]; Li and Milenkovic [2017]; Rodríguez [2003]

, Laplacian tensors 

Chen et al. [2017]; Chang et al. [2020], and non-linear Laplacian and -Laplacian operators Louis [2015]; Chan et al. [2018]; Yoshida [2019]; Li and Milenkovic [2018]. Other tensor methods for hypergraph community detection have also been developed Ke et al. [2019]. Although a number of techniques have been already been considered, many of these methods either (i) focus on finding only a single cluster at a time, (ii) are viewed mainly as theoretical results and do not scale well to large data sets or (iii) apply only to -uniform hypergraphs. There remains a need for theoretically grounded global hypergraph clustering techniques that directly optimize a multicluster objective and scale to large data sets.

2.2 Generative Models for Hypergraph Clustering

There exist several generative models for hypergraphs with cluster or community structure. The simplest of these is the -uniform hypergraph stochastic blockmodel, or -HSBM Ghoshdastidar and Dukkipati [2014]. In a -uniform hypergraph, all edges consist of exactly nodes. In the -HSBM, for each -tuple of nodes, the probability of realizing an edge on those nodes is a function only of the community labels of those nodes. When this function is chosen to favor sets of nodes with identical labels, cluster structure emerges. The -HSBM is an important testbed for studying the theoretical limitations of hypergraph partitioning algorithms [Kim et al., 2018]. It is, however, limited in its applicability to empirical hypergraph data sets, which are not usually uniform. Additionally, the -HSBM tends to generate hypergraphs in which the degrees of nodes within the same cluster are tightly concentrated, contrasting to many real-world networks in which degrees are highly heterogeneous. A recent proposal by Ke et al. [2019] generates degree-heterogeneous hypergraphs and is closely related to our proposal below, but the authors do not directly use the statistical structure of their model in their accompanying algorithm.

Another generative approach to hypergraph clustering uses the correspondence between hypergraphs and bipartite graphs. To construct the bipartite graph from a hypergraph , we associate to each node a bipartite node and to each edge a bipartite node . An edge exists in the bipartite graph if and only if (in ). Larremore et al. [2014] developed a degree-corrected stochastic blockmodel for bipartite networks (biSBM), which has since been extended via Bayesian tools Yen and Larremore [2020]. In principle, one can use these models to perform inference in hypergraphs. One first obtains the bipartite graph , constructs an estimate or posterior distribution over nodes , and then returns this estimate. This approach, however, involves statistical assumptions which require examination. Bipartite edges in the biSBM are conditionally independent given the model parameters. As a result, the biSBM is unable to model higher-order dependence in edges, such as the “All-Or-Nothing” dependence which we discuss later. Bipartite methods may still be appropriate for certain hypergraph data sets, but, like any statistical model, their assumptions must be appropriately matched to data. There is considerable value in the presence of multiple generative models with clear statements of their statistical assumptions.

2.3 Modularity Methods for Graphs and Hypergraphs

Modularity maximization is a popular heuristic for clustering dyadic graphs. In the original development Newman and Girvan [2004], the authors define a modularity objective . Let be the (possibly weighted) adjacency matrix of an undirected graph, with being the number or total weight of edges linking nodes and . Let

be the vector of node degrees, where

is the vector of ones. Then, gives the total edge weight in the graph. Finally, let be the vector of node labels, where is the number of distinct clusters. The modularity is then defined as

(1)

The modularity is thus a sum of differences between the observed number of edges and the term , with the th term only appearing if .111Throughout this paper, we use the notation that if the arguments and are the same and otherwise. The term is intended to reflect the expected number of edges between and under a configuration-type null model, although the question of precisely which model is applied is often obscure in the literature Chodrow [2020b]. Subsequent developments Reichardt and Bornholdt [2006]; Fortunato and Barthelemy [2007] incorporated a resolution parameter premultiplying the second term, which can be systematically varied in order to gain insight into community structure at multiple scales Weir et al. [2017]. There is considerable latitude to make more dramatic alterations to the second term, such as accounting for spatial correlations in social networks Expert et al. [2011] or applying alternative null distributions over graphs Fosdick et al. [2018]; Chodrow [2020b].

Theoretically, the modularity objective can be understood from either dynamical Delvenne et al. [2010] or statistical Zhang and Moore [2014]; Newman [2016] perspectives. In practice, the popularity of modularity maximization reflects the existence of highly scalable heuristics, of which the most popular is the Louvain algorithm Blondel et al. [2008] and variants Traag et al. [2019]. Modularity maximization, however, possesses a number of important limitations. First, it is an NP-hard problem Brandes et al. [2007]. Any given output partition of a heuristic is therefore not guaranteed to be the true global maximum of (1). There are often many uncorrelated partitions with nearly optimal modularity scores, complicating the interpretation of any single partition Good et al. [2010]. Modularity maximization, like all community-detection methodologies, must also grapple with the fact that in many empirical networks, ground truth partitions may be only weakly correlated—or entirely uncorrelated—with network structure Peel et al. [2017].

Despite these limitations, modularity maximization remains an important method for graph clustering. A natural question is how to extend modularity maximization to the setting of hypergraphs. One method is to apply the dyadic objective (1) to a dyadic graph derived from a hypergraph . One forms the clique-expansion graph , in which each each hyperedge in is represented by a clique in , and then applies (1) or one of its variants. Such a direct approach effectively discards the polyadic structure in the original data, and can also lead to a combinatorial explosion of dyadic edges when the hyperedges have many nodes. An alternative which does preserve some higher-order information is again to use the correspondence between hypergraphs and bipartite graphs. There are several extant proposals for bipartite modularities Barber [2007]; Guimerà et al. [2007]; Murata [2009b, a]; Liu and Murata [2010]. To our knowledge, the limitations and implicit statistical assumptions of these modularities remain relatively unexplored.

It is also possible to define modularity objectives directly on hypergraphs. Kumar et al. [2020] consider an approach that alternates between Louvain on a weighted clique-expansion graph and re-weighting of the graph edges according to a polyadic cut penalty. Kamiński et al. [2019] instead define a modularity objective by comparing the observed hypergraph to the expectation of a generalized Chung-Lu model  Chung and Lu [2002] for hypergraphs. They derive a modularity objective of the form

(2)

where is the number of clusters, is the number of hyperedges that fall fully within cluster , is the number of hyperedges of size , is the sum of degrees within cluster , and is the sum of degrees within the entire hypergraph. They also propose a stagewise greedy algorithm based on that of Clauset et al. [2004] for heuristically optimizing this objective. We will later generalize (2) via our generative model.

3 The Degree-Corrected Hypergraph Stochastic Blockmodel

The degree-corrected stochastic blockmodel is a generative model of graphs with both community structure and heterogeneous degree sequences Karrer and Newman [2011]. We now extend this model to the case of hypergraphs.

First, we introduce some notation. Fix a number of nodes , and let be a set of nodes.222Here and elsewhere, we use the notation . For each , let , and let so that is the set of ordered tuples of nodes of length no longer than . By definition, the same node may appear in an element of multiple times; this choice is traditional in the stochastic blockmodel literature and significantly simplifies certain calculations below. It is notationally convenient, however, for sums to range over the set of unordered node tuples. Write if tuples and differ only by a permutation of their elements. Each element of corresponds to elements of , where is the number of distinct ways to order the elements of . If , then , where is the number of times that node appears in . We regard each element of as the location of a possible edge in a hypergraph . For any vector and , we define . Let for be defined similarly. Next, let for any vector . Finally, we let be the number of hyperedges on the nodes specified by .

We now parameterize our model. As in the dyadic degree-corrected SBM, we assign to each node a parameter which governs its degree. These parameters are collected in a vector . To each node we assign one of groups , and collect these assignments in a vector . In our notation, and give the labels of the nodes in (unordered) or (ordered). Let be an affinity function which maps unordered tuples of node labels to nonnegative real numbers. If is large, this implies that nodes in the group configuration connect at a relatively high rate. For convenience, we will also say that when . We allow to remain fully general at this stage.

For fixed , , and , we generate a hypergraph by sampling, for each ,

and then setting

Here, , which is unordered, is a possible location for a weighted hyperedge, and it is realized by repeated attempts to “lay down” ordered tuples onto , for all . The value of can be any positive integer, representing multiple hyperedges on . This is a helpful modeling feature, as many empirical hypergraphs indeed contain multiple hyperedges between the same set of nodes.333Even in hypergraph data sets where we only know the presence or absence of hyperedges (but no weights), the Poisson-based model serves as a computationally convenient approximation to a Bernoulli-based model. The probability of realizing a given edge set is the product of probabilities over the individual entries:

We have used the fact that is a sum of

i.i.d. Poisson random variables with parameter

, and is therefore itself Poisson with parameter .

3.1 Estimation of Degree and Affinity Parameters

There are many methods for inference in stochastic blockmodels and their relatives, including variational coordinate ascent Airoldi et al. [2008], variational belief propagation Decelle et al. [2011]; Zhang and Moore [2014]

, and Markov Chain Monte Carlo

Peixoto [2019]. We use maximum-likelihood inference, a highly simplified approach. We do so in order to exploit a recent connection between maximum-likelihood inference in stochastic blockmodels and the popular modularity objective for graph clustering Newman [2016].

In the maximum-likelihood framework, we learn estimates of the node labels, of the affinity function, and of the degree parameters by solving the optimization problem

(3)

where is a given data set represented by a collection of (integer-weighted) hyperedges. As usual, it is easier to work with the log-likelihood, which has the same local optima. The log-likelihood is

(4)

where

The first term is the only part of the log-likelihood that depends on the group assignments and affinity function . The second term depends on , while the third term depends only on the data and can be disregarded for inferential purposes.

When solving (3), there are two forms of statistical unidentifiability that prevent us from learning unique estimates. First, permuting the group labels in and does not alter . Thus, we impose an arbitrary order on group labels. Another form of unidentifiability is slightly more subtle. Fix and . Fix and . We construct a modified vector given by

Define the modified affinity function , where is the number of times that label appears in . By construction, . To enforce identifiability, we must therefore place a joint normalization condition on and . We do so by enforcing

(5)

where and is the (weighted) number of hyperedges in which node appears, i.e. . The utility of (5) is that, when is known or estimated, the estimates and take simple, closed forms, as we show next. Later, in Section 4, we provide algorithms for estimating .

Using (5), it is possible to show that

(6)

A proof is given in Appendix A. This identity implies that the second term of does not depend on once the normalization (5) has been enforced. Since the first term of is also independent of , we can compute the estimate by maximizing alone. We write

(7)

The first-order optimality conditions on from the objective (7) with constraint (5) imply that is maximized with respect to when . The maximum-likelihood estimate is thus . Notably, though our derivation assumed that was fixed, it is not necessary to know in order to form the estimate .

We now consider the estimation of . We focus on the case in which takes constant value on some set of unordered tuples of labels.444In full generality, we can have one such for every possible label arrangement for each hyperedge size in the data. Later, we will make natural restrictions on . Inserting (6) into the definition of and differentiating with respect to yields, as consequence of the first-order optimaity condition, the closed-form estimate

(8)

Algebraic details are supplied in Appendix B. In contrast to the maximum-likelihood estimate , the estimate require that we know or estimate . We turn to this problem next.

4 Hypergraph Modularities

Our results from the previous section imply that the estimated degree parameter and piecewise constant affinity function can be efficiently estimated in closed form, provided an estimate of . This naturally suggests a coordinate-ascent optimization scheme, in which we alternate between estimating these parameters and estimating . We now discuss the latter step. From (4), it suffices to optimize with respect to . To do so, it is helpful to impose some additional structure on .

4.1 Symmetric Modularities

We obtain an important class of objective functions by stipulating that is symmetric with respect to permutations of node labels. In this case, depends not on the specific labels in , but only on the number of repetitions of each. Statistically, the corresponding DCHSBM generates hypergraphs in which all groups are statistically identical, conditioned on the degrees of their constituent nodes. Symmetric affinity functions thus lead to a flexible generalization of the planted partition stochastic blockmodel Jerrum and Sorkin [1998]; Condon and Karp [2001] to the setting of hypergraphs.

Define the function , where is the number of entries of in the th largest group in , with ties broken arbitrarily. For example, if , then . We call a partition vector. The symmetry assumption implies that is a function of only through . Accordingly, we abuse notation by writing when .

It is convenient to now define generalized cuts and volumes corresponding to the partition vector . Fix , the number of elements in a label vector with partition . Then we define

The function counts the number of edges that are split by into the partition , while the function is a sum-product of volumes over all grouping vectors that induce partition . Let be the set of partition vectors on sets up to size . We show in Appendix C that the symmetric modularity objective can then be written as

(9)

Direct calculation of requires the summation of elements, which can be impractical when either or are large. However, it is possible to evaluate these sums efficiently via a combinatorial identity. Define

The number

is an order-corrected version of the moments

in which there is a single term for each distinct partition vector . Let be the th standard basis vector. For each , let ; these moments can be computed in time.

Proposition 1.

Fix , and let be the number of nonzero elements of . Then,

A proof is given in Appendix E. We also give a formula for updating the entire set of moments when a candidate labeling is modified.

The objective function (9) is related in approach to the multiway hypergraph cut problem studied by Veldt et al. [2020a]. They formulate the hypergraph cut objective in terms of splitting functions, which associate penalties when edges are split between two or more clusters. One then aims to minimize the sum of penalties subject to constraints that certain nodes must not lie in the same cluster. Symmetric affinity functions in our framework correspond to signature-based splitting functions in their terminology. Table 1 lists four of many families of affinity functions. The All-Or-Nothing affinity function distinguishes only whether or not a given edge is contained entirely within a single cluster. This affinity function is especially important for scalable computation, and we discuss it further below. The Group Number affinity depends only on the number of distinct groups represented in an edge, regardless of the number of incident nodes in each one. Special cases of the Group Number affinity arise frequently in applications. When , the first term of the modularity objective corresponds to a hyperedge cut penalty that is known in the scientific computing literature as  Deveci et al. [2015], the metric Karypis and Kumar [2000], or the boundary cut Hendrickson and Kolda [2000]. It has also been called fanout in the database literature Kabiljo et al. [2017]. The related Sum of External Degrees penalty Karypis and Kumar [2000] is also a special case of the Group Number affinity. The Relative Plurality affinity considers only the relative difference between the size of the largest group represented in an edge and the next largest group. This rather specialized affinity function is especially appropriate in contexts where groups are expected to be roughly balanced, as we find, for example, in party affiliations in Congressional committees. Finally, the Pairwise affinity counts the number of pairs of nodes within the edge whose clusters differ. While this affinity function uses similar information to that used in dyadic graph models, there is no immediately apparent equivalence between any dyadic random graph and a DCHSBM with the Pairwise affinity function. There are many more symmetric affinity functions; see Table 3 of Veldt et al. [2020a] for several other splitting functions which can be used to define affinities.

All-Or-Nothing (AON)
Group Number (GN)
Relative Plurality (RP)
Pairwise (P)

Symmetric affinity functions. Throughout, is the number of nodes in partition , and are scalars, and , , and are arbitrary scalar functions.

4.2 All-Or-Nothing Modularity

As noted above, the All-Or-Nothing affinity function (Table 7.1) is of special interest for modeling and computation. This affinity function is a natural choice for systems in which the occurrence of an interaction or relationship depends strongly on homogeneity within groups.

Inserting the All-Or-Nothing affinity function from Table 1 into (9) yields, after a small amount of algebra, the objective

(10)

where , , and collects terms that do not depend on the partition . We collect and into vectors . Let be the (weighted) number of hyperedges of size , i.e., . We have also defined

as the number of hyperedges of size which contain nodes in two or more distinct clusters. This calculation is a direct generalization of that from Newman [2016] for graph modularity. Indeed, we recover the standard dyadic modularity objective by restricting to . We call (10) the All-Or-Nothing (AON) hypergraph modularity.

Recently, Kamiński et al. [2019] proposed a “strict modularity” objective for hypergraphs. This strict modularity is a special case of (10), obtained by choosing and such that and . However, leaving these parameters free lends important flexibility to our proposed AON objective (10). Tuning allows one to specify which hyperedge sizes are considered to be most relevant for clustering. In email communications, for example, a very large list of recipients may carry minimal information about their social relationships, and it may be desirable to down-weight large hyperedges by tuning . Tuning has the effect of modifying the sizes of clusters favored by the objective, in a direct generalization of the resolution parameter in dyadic modularity Reichardt and Bornholdt [2006]; Veldt et al. [2018]. Importantly, it is not necessary to specify the values of these parameters a priori; instead, they can be adaptively estimated via (8).

5 Hypergraph Maximum-Likelihood Louvain

In order to optimize the modularity objectives (9) and (10), we propose a family of agglomerative clustering algorithms. These algorithms greedily improve the specified objective through local updates to the node label vector . The structure of these algorithms is based on the widely used and highly performant Louvain heuristic for graphs Blondel et al. [2008]. The standard heuristic alternates between two phases. In the first phase, each node begins in its own singleton cluster. Then, each node is visited and moved to the cluster of the adjacent node which maximizes the increase in the objective . If no such move increases the objective, then ’s label is not changed. This process repeats until no such moves exist which increase the objective. In the second phase, a “supernode” is formed for each label. The supernode represents the set of all nodes sharing that label. Then, the first phase is repeated, generating an updated labeling of supernodes, which are then aggregated in the second phase. The process repeats until no more improvement is possible. Since every step in the first phase improves the objective, the algorithm terminates with a locally optimal cluster vector .

This heuristic generalizes naturally to the setting of hypergraphs. However, the incorporation of heterogeneous hyperedge sizes and general affinity functions considerably complicates implementation. Here we provide a highly general Hypergraph Maximum-Likelihood Louvain (HMLL) algorithm for optimizing the symmetric modularity objective (9). For the important case of the All-Or-Nothing (AON) affinity, the simplified objective (10) admits a much simpler and faster specialized Louvain algorithm, which we describe in Appendix F. As we show in subsequent experiments, this specialized algorithm is highly scalable and effective in recovering ground truth clusters in data sets with polyadic structure plausibly modeled by the AON affinity.

5.1 Symmetric Hypergraph Maximum-Likelihood Louvain

We first show pseudocode for a single step of Symmetric HMLL in Algorithm 1. In this step, we visit each cluster of nodes sharing a label , and consider changing the label of all nodes in to , where is a new label such that at least one edge is incident on both and . We evaluate the change in the objective function associated with this change. We then carry out the change that gives the largest change in objective, provided that change is positive. After visiting each cluster , the process is repeated until no more improvements are possible. This process is repeated in an outer loop given by Algorithm 2. After initializing each node in a singleton cluster, we then alternate between consolidating nodes by their cluster assignments and modifying node labels via Algorithm 1.

Data: Hypergraph , affinity function , current label vector
Result: Updated label vector
  // unique cluster labels in
while improving do
      
       for  in  do
               // set of nodes with label
               // adjacent clusters
             // maximum and maximizer of the change in from moving node to adjacent cluster
             // update if improvement found
             if  then
                   for  in  do
                        
                        
                   end for
                  
             end if
            
       end for
      
end while
return
Algorithm 1 SymmetricHMLLstep(, , )
Data: Hypergraph , affinity function
Result: Label vector
  // assign each cluster to singleton
do
      
      
while return
Algorithm 2 SymmetricHMLL(, )

Algorithm 1 relies on a function which computes the change in the objective associated with moving all nodes in cluster to cluster . Changes to the second (volume) term in the objective can be computed efficiently using creftypecap 1. Evaluating changes to the first (cut) term requires summing across all hyperedges incident to a node or set of nodes. At each hyperedge, we must evaluate the affinity on the current partition , as well as the affinity associated with the candidate updated partition . This situation contrasts with the case of the dyadic graph Louvain algorithm, in which it is sufficient to simply check whether a given edge joins nodes in the same or different clusters. Because we need to store and update the partition vector for each hyperedge, we cannot collapse clusters of nodes into monolithic supernodes and recursively apply Algorithm 2 on a reduced data structure, as customary in dyadic graph Louvain. Thus, while clusters of nodes move as a unit in Algorithm 1 as well, it is necessary in this case to operate on the full adjacency data at all stages of the algorithm. This implies that Algorithm 2 can be slow on hypergraphs of even modest size. The development of efficient algorithms for optimizing the general symmetric modularity objective or various special cases is an important avenue of future work.

5.2 All-Or-Nothing Hypergraph Maximum-Likelihood Louvain

When is the All-Or-Nothing affinity function, considerable simplification is possible. For each edge we need not compute the full partition vector but only check whether or not . Rather than a general affinity function , we instead supply the parameter vectors and appearing in (10). This allows us to compute on considerably simplified data structures. In particular, we are able to follow the classical Louvain strategy of collapsing clusters into single, consolidated “supernodes,” and restrict attention to hyperedges that span multiple supernodes. Because we do not need to track the precise way in which the hyperedges span multiple supernodes, we can forget much of the original adjacency data and instead simply store the edge sizes of the hypergraph. These simplifications enable both significant memory savings and very rapid evaluation of the objective update function . We provide pseudocode for exploiting these simplifications in Appendix F.

6 Experiments with Synthetic Data

6.1 Runtime

Dyadic Louvain algorithms are known for being highly efficient in large graphs. Here, we show that AON HMLL can achieve similar performance on synthetic data to Graph MLL (GMLL), a variant of the standard dyadic Louvain algorithm in which we return the combination of resolution parameter and partition that yield the highest dyadic likelihood. For a fixed number of nodes , we consider a DCHSBM-like hypergraph model with clusters and hyperedges with size uniformly distributed between 2 and 4. Each -edge is, with probability , placed uniformly at random on any nodes within the same cluster. Otherwise, with probability , the edge is instead placed uniformly at random on any set of nodes. We use this model rather than a direct DCHSBM to avoid the computational burden of sampling edges at each -tuple of nodes, which is prohibitive when is large. For the purpose of performance testing, we compute the maximum-likelihood estimates of the parameter vectors and (in the case of AON HMLL) and the resolution parameter (in the case of GMLL). We emphasize that this is typically not possible in practical applications, since the ground truth labels are not known. We make this choice in order to focus on a direct comparison runtimes of each algorithm in a situation in which both can succeed. In later sections, we study the ability of HMLL and GMLL to recover known groups in synthetic and empirical data when affinities and resolution parameters are not known.

[t] Runtime, Adjusted Rand Index, and number of clusters returned by GMLL and HMLL in a synthetic testbed with optimal affinity parameters. The within-cluster edge placement probabilities are , , and . We also show in light gray the results obtained by using GMLL as a preprocessing step, whose output partition is then refined by HMLL (light gray).

Section 6.1 shows runtime, Adjusted Rand (ARI) and four are rarely (if ever) contained completely inside clusters. Thus, hyperedges of different sizes provide different signal regarding ground truth clusters. For this experiment, we implemented Graph MLL by computing a normalized clique projection, in which nodes are joined by weighted dyadic edges with weights

We also performed experiments on an unnormalized clique projection with , but do not show these results because, in this experiment, the associated MLL algorithm consistently fails to recover labels correlated with the planted clusters.

On smaller instances, HMLL outperforms Graph MLL in recovering planted clusters, as measured by the ARI. For larger instances, the recovery results are comparable for larger . Interestingly, although GMLL and HMLL obtain similar accuracy in this experiment, they do so in different ways, with HMLL tending to generate more, smaller clusters than GMLL. Importantly, the runtimes are nearly indistinguishable, indicating that dyadic clique projections are necessary neither for accuracy nor for performance. We observed other choices of the parameters , , and in which HMLL substantially outperformed GMLL in cluster recovery and vice versa; however, in each case the algorithms’ respective runtimes tended to differ by only a small constant factor.

Interestingly, in this synthetic experiment, a combination of the two algorithms leads to the strongest recovery results. In addition to running each algorithm independently, we also ran a two-stage algorithm in which GMLL is used to generate an intermediate partition, and then HMLL is used to refine it. This procedure is similar to the warmstart approach from Kamiński et al. [2020]. We emphasize again that these results are obtained on synthetic hypergraphs with pre-optimized affinity parameters, and so the effectiveness of the refinement strategy may not generalize to real data sets. Indeed, in the experiments on empirical data shown in Section 7, we do not show results from the refinement procedure because the output partition was in each case essentially indistinguishable from the output of the dyadic partition. This may reflect the fact that we did not allow the algorithms to learn a priori the affinity parameters associated with the true data labels. Further investigation into the performance of hybrid strategies would be of considerable practical importance.

6.2 Dyadic Projections and the Detectability Threshold

Informally, an algorithm is able to detect communities in a random graph model with fixed labels when the output labeling of that algorithm is, with probability bounded above zero, correlated with . Using arguments from statistical physics, Decelle et al. [2011] conjectured the existence of a regime in the graph stochastic blockmodel in which no algorithm can successfully detect communities. This conjecture has since been refined and proven in various special cases; see Abbe [2017] for a survey. In the dyadic stochastic blockmodel with two equal-sized planted communities, a necessary condition for detectability is that

(11)

where is the mean number of within-cluster edges attached to a node, and is the mean number of between-cluster edges attached to a node. If this condition is not satisfied, no algorithm can reliably detect communities in the associated graph stochastic blockmodel. This bound limits both direct inferential methods, such as Bayesian or maximum-likelihood techniques, as well as methods based on maximization of modularity or other graph objectives Nadakuditi and Newman [2012]. Several recent papers have considered the detectability problem in the case of uniform hypergraphs Ghoshdastidar and Dukkipati [2014, 2017]; Angelini et al. [2016]. In our model, the presence of edges of multiple sizes complicates analysis. Here, we limit ourselves to an experimental demonstration that the regimes of detectability for the graph SBM and our DCHSBM can differ significantly.

Detectability experiments in synthetic hypergraphs. For , is the proportion of within-cluster edges of size . Each pixel gives the mean ARI over 20 independently generated DCHSBMs of size where each node is incident to, on average, five 2-edges and five 3-edges. (Left) The recovered partition is obtained from GMLL. The dashed white line gives the detectability threshold for dyadic a SBM with the same within- and between-cluster average as the clique projection. (Right) The recovered partition is obtained from AON HMLL (Algorithm 3). The dashed line gives the detectability threshold in for a dyadic algorithm that ignores 3-edges. The dashed line gives the same threshold in for a dyadic algorithm that ignores 2-edges. The returned partition is highest-likelihood partition from 20 alternations between updating and inference of the affinity parameters.

Section 6.2 shows two experiments on a simple DCHSBM with two equal-sized communities of 250 nodes each. The affinity is tuned so that:

  1. Each node is incident to, on average, five 2-edges and five 3-edges.

  2. A fraction of 2-edges join nodes in the same cluster, while a fraction of 2-edges join nodes in different clusters.

  3. A fraction of 3-edges join nodes in the same cluster, while a fraction of 3-edges join nodes in different clusters.

In the left panel, we show the Adjusted Rand Index (ARI) of the returned partition against the true partition when using the unnormalized variant of GMLL (results for the normalized variant are similar). There is a gap between the theoretical bound (dashed white line) given by (11) and the regime in which GMLL is able to find partitions correlated with the planted ones. This reflects the fact that the Louvain algorithm, as a stagewise greedy method, possesses no optimality guarantees.

In the right panel, we show the same experiment using AON HMLL. HMLL is able to detect the planted partition for a range of parameter values in which GMLL is not. These include the case in which edges of certain sizes are largely between-cluster, as shown in the top left (small ) and bottom right (small ). There is a regime (mid and bottom right) in which the algorithm appears to be constrained by a threshold , which represents the detectability threshold for a dyadic algorithm when 3-edges are ignored. This suggests that HMLL is effectively ignoring 3-edges in this regime. As increases, however, 3-edges become more informative and the partition can be detected for some values (top right). There is also a broad regime (top left) in which the hypergrpah algorithm is able effectively to use both 2- and 3-edges to detect clusters, even when 2-edges are disassortative. This regime is largely inaccessible to GMLL. Intriguingly, there are also values of and in which GMLL is able to detect the planted partition while HMLL is not. This may indicate that the pooling of edges of different sizes implied by the dyadic projection can be useful in some regimes. We note again that neither GMLL or HMLL are optimal inference algorithms. An optimal hypergraph algorithm might significantly extend the detectable regime in the right panel of Section 6.2. We pose the development of such algorithms, as well as its analysis, as highly promising avenues for future research.

7 Experiments with Empirical Data

Next, we analyze several hypergraphs derived from empirical data. The first two are hypergraphs of human close-proximity contact interactions Benson et al. [2018], obtained from wearable sensor data at a primary school Stehlé et al. [2011] and a high school Mastrandrea et al. [2015]. Nodes are students or teachers, and a hyperedge connects groups of people that were all jointly in proximity to one another. Node labels identify the classrooms to which each student belongs, and the primary school data also includes a teacher associated to each class. Next, we created two hypergraphs from U.S. Congressional bill cosponsorship data Fowler [2006a, b], where nodes correspond to congresspersons, and hyperedges correspond to the sponsor and all cosponsors of a bill in either the House of Representatives or the Senate. We constructed another pair of data sets from the U.S. Congress in the form of committee memberships Stewart and Woon [2021]. Each edge is a committee in a meeting of Congress, and each node again corresponds to a member of the House or a senator. A node is contained in an edge if the corresponding legislator was a member of the committee during the specified meeting of Congress. The 103rd through 115th Congresses are represented, spanning the years 1993–2017. There are again separate data sets for House and Senate members. In all congressional data sets, the node labels give the political parties of the members. We also used a hypergraph of Walmart purchases Amburg et al. [2020], where each node is a product and a hyperedge connects a set of products that were co-purchased by a customer in a single shopping trip. Each node has an associated product category label. Finally, we constructed a hypergraph where nodes correspond to hotels listed at trivago.com, and each hyperedge corresponds to a set of hotels whose web site was clicked on by a user of Trivago within a browsing session. This hypergraph was derived from data released for the 2019 ACM RecSys Challenge contest.555https://recsys.acm.org/recsys19/challenge/ For each hotel, the node label gives the country in which it is located. The data sets vary in size in terms of the number of nodes, hyperedges, hyperedge sizes, and node labels (Table 7).

Summary of study data sets. Shown are the number of nodes , number of hyperedges , mean degree

, standard deviation of degree

, mean edge size , standard deviation of edge size , and number of data labels . contact-primary-school 242 12,704 127.0 55.3 2.4 0.6 11 contact-high-school 327 7,818 55.6 27.1 2.3 0.5 9 house-bills 1,494 43,047 274.0 282.7 9.5 7.2 2 senate-bills 293 20,006 493.4 406.3 7.3 5.5 2 house-committees 1,290 340 9.2 7.1 35.2 21.3 2 senate-committees 282 315 19.0 14.7 17.5 6.6 2 walmart-purchases 88,860 65,979 5.1 26.7 6.7 5.3 11 trivago-clicks 171,495 220,758 4.0 7.0 4.2 2.0 160

7.1 Model Comparison and Higher-Order Structure

It is often stated that higher-order features are important for understanding the structure and function of complex networks. It is less often clarified what kinds of higher-order features are relevant for which networks. Generative modeling provides one way to compare different kinds of higher-order structure. In the DCHSBM, this structure is specified by the affinity function . Comparison of the likelihood functions obtained by each affinity can indicate which one is most plausible as a higher-order generative mechanism of the underlying data. We performed such a comparison using the symmetric affinity functions from Table 1 and the labels for nodes described above; in this setup, we can compute the optimal estimate, given its functional form. In order to make concrete comparisons, it is necessary to specify the functional forms of the Group Number (GN), Relative Plurality (RP), and Pairwise (P) affinities. We use the following parameterizations:

(Group Number)
(Relative Plurality)
(Pairwise)

The Group Number affinity function assigns a separate parameter to each combination of edge size and number of groups. The Relative Plurality affinity function assigns one parameter for the case that the difference between the largest and second largest groups within an edge exceeds , where is the size of the edge. The Pairwise affinity function assigns one parameter to the case that the total number of dyadic pairs in differing groups exceeds half the possible number of such pairs. Different parameterizations of these symmetric affinities are possible, which may generate slightly different results. Because these affinity functions possess different numbers of parameters, we measure the Bayesian Information Criterion (BIC) Schwarz [1978], which penalizes affinity functions with more parameters than are supported by the data.

Table 7.1 shows the BIC for the DCHSBM using each of these affinity functions. Importantly, no single affinity function is preferred across all of the study data sets, suggesting the presence of different kinds of polyadic structure. AON, GN, and P all prefer edges with homogeneous cluster labels, whereas RP prefers edges in which the two most common labels are roughly balanced in representation. In the two congressional committee data sets, RP achieves the optimal BIC, while in each of the other data sets, one of the three affinities that promotes edge homogeneity is instead preferred. There are also important differences between these three affinities. In house-bills, the Pairwise affinity function achieves the lowest BIC overall, while in walmart-purchases the Pairwise affinity is preferred over all but the Group Number affinity. This suggests that a model involving only pairwise comparison of node labels can provide relatively strong generative explanations of the data in these cases. This in turn suggests that dyadic algorithms may perform at least as well on these data sets as their polyadic counterparts. Indeed, as we will see below, in both of these data sets, dyadic algorithms can return clusterings more correlated with ground truth than those returned by AON HMLL.

AON GN RP P
contact-high-school
contact-primary-school
house-committees
senate-committees
house-bills
senate-bills
walmart-purchases
trivago-clicks

Bayesian Information Criteria (BIC) of the DCHSBM using the the All-Or-Nothing (AON), Group Number (GN), Relative Plurality (RP), and Pairwise (P) affinity functions on our study data sets. Definitions of each affinity function are supplied in Table 1. Lower BIC indicates a more plausible model. The affinity function achieving the lowest BIC in each data set is shown in bold.

7.2 Recovering Classes in Contact Hypergraphs

To test the AON HMLL algorithm itself, we first studied its behavior in the contact-primary-school and contact-high-school networks. The comparison of BIC scores from Table 7.1 suggests that GN may be the most explanatory model of the data, but we instead use AON in order to take advantage of its considerable computational benefits. We performed 20 alternations between AON HMLL and estimation of the AON parameters, and returned the partition with the highest DCHSBM likelihood. We compare the results to two dyadic methods. Each step of the Graph Louvain algorithm alternates between using the standard Louvain algorithm Blondel et al. [2008] to infer clusters and estimating the resolution parameter using the maximum-likelihood framework of Newman [2016]. Graph Louvain returns the partition which maximizes the classical dyadic modularity objective. We also compare to Graph Maximum-Likelihood Louvain (GMLL), which carries out the same alternation, but instead returns the partition that maximizes the log-likelihood of the corresponding planted partition stochastic blockmodel.

Comparison of clustering algorithms in contact-primary-school and contact-high-school. For each data set, we show a partition obtained from the classical graph Louvain modularity maximization heuristic; a partition obtained from Graph Maximum-Likelihood Louvain (GMLL); and partition obtained by AON HMLL. The partition shown is the one which attains the corresponding objective function after 20 rounds of iterative likelihood maximization. Each box records the number of agents with the specified combination of inferred cluster and ground truth label. The bottom row visualizes the number of edges of size , the inferred size weights , and inferred resolution parameters as defined in (10). On the far right, .

Section 7.2 compares the performance of each of these algorithms. In the case of contact-primary-school, we consider the ground truth partition to be the one that assigns exactly one teacher to each class. Graph Louvain is able to find partitions of students with clear correlations with the given class labels, but conflates two primary school classes and splits several high school classes (left column, top two rows). Graph MLL is able to perfectly recover the primary school student class labels, and misclassifies three high school students. Our proposed AON HMLL is able to correctly recover the given partitions in both data sets.

We can obtain some qualitative insight into the behavior of HMLL by studying the structure of the inferred affinity function . The most intuitive way to do so is by considering the derived parameters and from (10). The bottom row of Section 7.2 shows these parameters, as well as the distribution of edge sizes. The dependence of on edge size provides one explanation of why Graph MLL succeeds in contact-primary-school but makes several errors in contact-high-school. Under the standard dyadic projection, a -hyperedge generates 2-edges, and therefore appears in the dyadic modularity objective distinct times. In the case of contact-primary-school, the estimated importance parameter is indeed relatively close to (bottom center panel of Section 7.2). At the optimal partition, the relative weights of edges are therefore distorted relatively little by the clique projection. On the other hand, the estimates for in contact-high-school deviate considerably from , especially for . Here, small edges feature much more prominently in the polyadic modularity objective than they do in the projected dyadic objective, implying that the latter is a poorer approximation to the former near the optimal partition. This difference may explain the small errors in Graph MLL in contact-high-school. The bottom-right panel of Section 7.2 compares the inferred value of the size-specific resolution parameter to , the implicit value used in Kamiński et al. [2019]. The inferred resolution parameters are consistently larger and increase with , highlighting the value of adaptively estimating these parameters in our approach.

7.3 Cluster Recovery with Large Hyperedges

In Section 7.3, we study the ability of AON HMLL to recover ground truth communities in several more of our study data sets. Unlike the two contact networks, each of these data sets contains edges of size up to 25 nodes. We have excluded house-committees and senate-committees on the grounds that these data sets are disassortative, indicating that AON is clearly inappropriate. We compare AON HMLL to two variants of GMLL. In the unnormalized variant, we obtain a dyadic graph by replacing each -edge with a -clique, thus generating a total of dyadic edges. In the normalized variant, we weight each edge in the -clique by a factor of . The normalized dyadic degree of each node is then equal to its degree in the original hypergraph. In either case, we then alternate between the dyadic Louvain algorithm for estimating clusters and maximum-likelihood inference of the resolution parameter . In each trial, we perform 20 iterations of AON HMLL as well as the two GMLL variants, returning from these the combination of group labels and parameters that achieves the highest likelihood. We then compare the clustering to the ground truth labels via the Adjusted Rand Index. We vary the maximum edge size in order to show how each algorithm responds to the incorporation of progressively larger edges. Because extreme sparsity poses issues for community detection algorithms in general Abbe [2017], we show experiments for progressively denser cores of trivago-clicks and walmart-purchases.

Comparison of Hypergraph All-Or-Nothing MLL Algorithm 3 against dyadic likelihood Louvain in data with known clusters. Points give the Adjusted Rand Index of the highest-likelihood partition obtained after 20 alternations between partitioning and parameter estimation. The maximum edge size varies along the horizontal axis. In the panel titles, is the number of nodes and the number of edges when . Note that the vertical axis limits vary between panels.

The results highlight the strong dependence of the performance of AON HMLL on the relative plausibility of the AON affinity function as a generative mechanism for the data (cf. Table 7.1). In trivago-clicks, the AON affinity function achieved the lowest BIC of all four candidates. Because AON is a more plausible generating mechanism, it is not surprising that AON HMLL is able to find partitions considerably more correlated with the supplied data labels than those returned by the dyadic variants. In walmart-purchases, on the other hand, the Pairwise affinity is preferred to AON. In this case, AON HMLL performs much worse, and in the 2-core even returns clusters which are anticorrelated with the supplied labels. As weakly-connected nodes are removed and the resulting data becomes denser, HMLL begins to return correlated clusters. However, the normalized GMLL variant is at least as effective in recovering the data labels. In the two Congressional bills data sets, the Pairwise affinity achieves a lower BIC than AON in the House and a comparable one in the Senate. It is therefore not surprising that a dyadic method outperforms AON HMLL in each of these cases. Interestingly, unnormalized GMLL performs best in house-bills and senate-bills, while normalized GMLL is preferable in walmart-purchases. In addition, HMLL is the worst algorithm only in the case of the 2-core of walmart-purchases for small . HMLL may therefore be the algorithm of choice in cases when it is not known whether normalized or unnormalized dyadic representations are more appropriate for the data.

When interpreting these recovery results, it is important to contextualize them within the limitations of community detection methods in general and of modularity maximization in particular. There is no “best algorithm” for community detection that does not make implicit assumptions about the structure of the data, and mismatch of algorithms to data sets can generate misleading results Peel et al. [2017].666That being said, a test similar to BESTest Peel et al. [2017] reveals that the likelihood under DCHSBM is much greater than the likelihood under random label permutations in most data set configurations, implying significant correlation between network structure and the labels. Even when the data generating process indeed matches algorithmic assumptions—such as a synthetic data set generating from a stochastic blockmodel—even optimal algorithms may fail to detect planted communities due to sparsity Decelle et al. [2011]; Abbe [2017]. Greedy modularity maximization, such as in the Louvain variants considered here, only finds one of possibly many local optima Good et al. [2010], some of which may be largely uncorrelated with each other. These considerations imply that (a) we cannot rule out the existence of other local optima which might achieve higher scores in any of the three algorithms and (b) the fact that an algorithm fails to recover a clustering close to the ground truth does not imply that it is “failing” in its stated objective, namely, local likelihood maximization. Overall, our results suggest that, when the assumptions of the DCHSBM with All-Or-Nothing affinity are appropriate to the data, AON HMLL can outperform dyadic approaches in recovering ground truth communities. In practice, since we often do not have access to ground truth labels, the question of whether or not the assumptions are appropriate to the data should be informed by domain expertise.

8 Discussion

We have proposed a generative approach for clustering polyadic data, grounded in a degree-corrected hypergraph stochastic blockmodel. From this model we derived a symmetric, modularity-like objective, which includes the All-Or-Nothing (AON) modularity objective as an important special case. Theoretically, our approach connects hypergraph modularity objectives to concrete modeling assumptions, which can be tuned in response to domain expertise. We have also formulated Louvain-like algorithms for optimizing these objectives, which are highly scalable in the case of the AON affinity function. By embedding this heuristic within an alternating maximum-likelihood scheme, we are able to adaptively learn both node clusters and affinity parameters. We showed experimentally that hypergraph algorithms possess markedly different detectability regimes from dyadic algorithms. We also conducted experiments on empirical data, finding that hypergraph methods are preferred to dyadic ones in data sets where their modeling assumptions are well-founded.

Our work points toward many directions of further research. One of these directions is algorithmic. Our greedy coordinate-ascent maximum-likelihood framework for inference in the DCHSBM has several important limitations. First, because we rely on an NP-hard optimization step, global maximization of the likelihood is never assured. Second, maximum-likelihood itself is limited as an inference paradigm, as it uses information contained only within a small part of the likelihood landscape. Third, the edgewise agglomerative approach embodied by Louvain-style algorithms is limited in applicability to affinity functions that promote homogeneity within edges. Alternative inference paradigms may ameliorate some or all of these limitations. Fully Bayesian treatments Peixoto [2019] provide one promising path, although these are sometimes limited in their computational scalability. Variational belief-propagation Decelle et al. [2011]; Zhang and Moore [2014] provides an intriguing compromise, achieving considerable scalability in exchange for several approximations. A derivation of such an algorithm, and an exploration into the applicability of the belief-propagation approximations in hypergraph data, would be of considerable interest. Methods enabling scalable inference with more general affinity functions would be of special practical importance.

There are also several important directions of theoretical development. One of these is the question of detectability in the DCHSBM. Because the DCHSBM is more flexible than the dyadic DCSBM, the theory of detectability in this model may be substantially more complex. Another direction concerns the properties of the dyadic modularity objective that extend to the hypergraph modularity objectives discussed here. In addition to its role as a comparison against null models Newman [2006] and as a term in the DCSBM likelihood Newman [2016], the dyadic modularity also expresses the stability of diffusion processes on graphs Delvenne et al. [2010] and the energy of discrete surface tensions defined on graphs Boyd et al. [2019]. Extensions of these properties, or explanations of why they fail to generalize, would be helpful for both theorists and practitioners.

Funding

This research was supported in part by ARO Award W911NF19-1-0057, ARO MURI, NSF Award DMS-1830274, and JP Morgan Chase & Co.

Software and Data

Software and data sufficient to reproduce and extend the experiments and analysis in this paper are available at the following repository: https://github.com/PhilChodrow/HypergraphModularity

References

  • Abbe [2017] E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18 (1), pp. 6446–6531, 2017.
  • Agarwal et al. [2006] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In Proceedings of the 23rd International Conference on Machine Learning, pp. 17–24. 2006. doi:10.1145/1143844.1143847.
  • Agarwal et al. [2005] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwise clustering. In

    Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2 - Volume 02

    , pp. 838–845. 2005.
    doi:10.1109/CVPR.2005.89.
  • Airoldi et al. [2008] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9 (Sep), pp. 1981–2014, 2008.
  • Akhremtsev et al. [2017] Y. Akhremtsev, T. Heuer, P. Sanders, and S. Schlag. Engineering a direct k-way hypergraph partitioning algorithm. In 19th Workshop on Algorithm Engineering and Experiments, (ALENEX 2017), pp. 28–42. 2017.
  • Amburg et al. [2020] I. Amburg, N. Veldt, and A. Benson. Clustering in graphs and hypergraphs with categorical edge labels. In Proceedings of The Web Conference 2020, pp. 706–717. 2020.
  • Angelini et al. [2016] M. C. Angelini, F. Caltagirone, F. Krzakala, and L. Zdeborova. Spectral detection on sparse hypergraphs. 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015, pp. 66–73, 2016. arXiv:1507.04113, doi:10.1109/ALLERTON.2015.7446987.
  • Athreya et al. [2017] A. Athreya, D. E. Fishkind, M. Tang, C. E. Priebe, Y. Park, J. T. Vogelstein, K. Levin, V. Lyzinski, and Y. Qin. Statistical inference on random dot product graphs: a survey. The Journal of Machine Learning Research, 18 (1), pp. 8393–8484, 2017.
  • Ballard et al. [2016] G. Ballard, A. Druinsky, N. Knight, and O. Schwartz. Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing, 3 (3), pp. 18:1–18:34, 2016. doi:10.1145/3015144.
  • Barber [2007] M. J. Barber. Modularity and community detection in bipartite networks. Physical Review E, 76 (6), p. 066102, 2007.
  • Battiston et al. [2020] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J.-G. Young, and G. Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 2020.
  • Benson et al. [2018] A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, and J. Kleinberg. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences, 2018. doi:10.1073/pnas.1800683115.
  • Benson et al. [2016] A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353 (6295), pp. 163–166, 2016. doi:10.1126/science.aad9029.
  • Blondel et al. [2008] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008 (10), p. P10008, 2008.
  • Bolla [1993] M. Bolla. Spectra, euclidean representations and clusterings of hypergraphs. Discrete Math., 117 (1-3), pp. 19–39, 1993. doi:10.1016/0012-365X(93)90322-K.
  • Boyd et al. [2019] Z. M. Boyd, M. A. Porter, and A. L. Bertozzi. Stochastic block models are a discrete surface tension. Journal of Nonlinear Science, pp. 1–34, 2019.
  • Brandes et al. [2007] U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner. On finding graph clusterings with maximum modularity. In International Workshop on Graph-Theoretic Concepts in Computer Science, pp. 121–132. 2007.
  • Chan et al. [2018] T.-H. H. Chan, A. Louis, Z. G. Tang, and C. Zhang. Spectral properties of hypergraph laplacian and approximation algorithms. J. ACM, 65 (3), 2018. doi:10.1145/3178123.
  • Chang et al. [2020] J. Chang, Y. Chen, L. Qi, and H. Yan. Hypergraph clustering using a new laplacian tensor with applications in image processing. SIAM Journal on Imaging Sciences, 13 (3), pp. 1157–1178, 2020. doi:10.1137/19M1291601.
  • Chen et al. [2017] Y. Chen, L. Qi, and X. Zhang. The fiedler vector of a laplacian tensor for hypergraph partitioning. SIAM Journal on Scientific Computing, 39 (6), pp. A2508–A2537, 2017. doi:10.1137/16M1094828.
  • Chodrow [2020a] P. S. Chodrow. Configuration models of random hypergraphs. Journal of Complex Networks, 8 (3), p. cnaa018, 2020a.
  • Chodrow [2020b] ———. Moments of uniform random multigraphs with fixed degree sequences. SIAM Journal on Mathematics of Data Science, 2 (4), pp. 1034–1065, 2020b.
  • Chung and Lu [2002] F. Chung and L. Lu. Connected components in random graphs with given expected degree sequences. Annals of combinatorics, 6 (2), pp. 125–145, 2002.
  • Chung [1992] F. R. L. Chung. Spectral Graph Theory, American Mathematical Society, 1992.
  • Clauset et al. [2004] A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70 (6), p. 066111, 2004.
  • Condon and Karp [2001] A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures & Algorithms, 18 (2), pp. 116–140, 2001.
  • de Arruda et al. [2020] G. F. de Arruda, G. Petri, and Y. Moreno. Social contagion models on hypergraphs. Physical Review Research, 2 (2), p. 023032, 2020.
  • Decelle et al. [2011] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84 (6), p. 066106, 2011.
  • Delvenne et al. [2010] J.-C. Delvenne, S. N. Yaliraki, and M. Barahona. Stability of graph communities across time scales. Proceedings of the National Academy of Sciences, 107 (29), pp. 12755–12760, 2010.
  • Deveci et al. [2015] M. Deveci, K. Kaya, B. Uçar, and Ãmit V. Ãatalyürek. Hypergraph partitioning for multiple communication cost metrics: Model and methods. Journal of Parallel and Distributed Computing, 77, pp. 69 – 83, 2015. doi:10.1016/j.jpdc.2014.12.002.
  • Easley and Kleinberg [2010] D. Easley and J. Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world, Cambridge University Press, 2010.
  • Expert et al. [2011] P. Expert, T. S. Evans, V. D. Blondel, and R. Lambiotte. Uncovering space-independent communities in spatial networks. Proceedings of the National Academy of Sciences, 108 (19), pp. 7663–7668, 2011.
  • Fortunato and Barthelemy [2007] S. Fortunato and M. Barthelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104 (1), pp. 36–41, 2007.
  • Fortunato and Hric [2016] S. Fortunato and D. Hric. Community detection in networks: A user guide. Physics Reports, 659, pp. 1–44, 2016.
  • Fosdick et al. [2018] B. K. Fosdick, D. B. Larremore, J. Nishimura, and J. Ugander. Configuring random graph models with fixed degree sequences. SIAM Review, 60 (2), pp. 315–355, 2018.
  • Fowler [2006a] J. H. Fowler. Connecting the congress: A study of cosponsorship networks. Political Analysis, pp. 456–487, 2006a.
  • Fowler [2006b] ———. Legislative cosponsorship networks in the us house and senate. Social Networks, 28 (4), pp. 454–465, 2006b.
  • Ghoshdastidar and Dukkipati [2014] D. Ghoshdastidar and A. Dukkipati. Consistency of spectral partitioning of uniform hypergraphs under planted partition model. In Advances in Neural Information Processing Systems, pp. 397–405. 2014.
  • Ghoshdastidar and Dukkipati [2017] ———. Consistency of spectral hypergraph partitioning under planted partition model. Annals of Statistics, 45 (1), pp. 289–315, 2017. arXiv:1505.01582, doi:10.1214/16-AOS1453.
  • Good et al. [2010] B. H. Good, Y.-A. De Montjoye, and A. Clauset. Performance of modularity maximization in practical contexts. Physical Review E, 81 (4), p. 046106, 2010.
  • Guimerà et al. [2007] R. Guimerà, M. Sales-Pardo, and L. A. N. Amaral. Module identification in bipartite and directed networks. Physical Review E, 76 (3), p. 036102, 2007.
  • Hendrickson and Kolda [2000] B. Hendrickson and T. G. Kolda. Graph partitioning models for parallel computing. Parallel computing, 26 (12), pp. 1519–1534, 2000.
  • Hoff et al. [2002] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical association, 97 (460), pp. 1090–1098, 2002.
  • Jackson [2008] M. O. Jackson. Social and Economic Networks, Princeton University Press, 2008.
  • Jerrum and Sorkin [1998] M. Jerrum and G. B. Sorkin. The metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82 (1-3), pp. 155–175, 1998.
  • Kabiljo et al. [2017] I. Kabiljo, B. Karrer, M. Pundir, S. Pupyrev, and A. Shalita. Social hash partitioner: a scalable distributed hypergraph partitioner. Proceedings of the VLDB Endowment, 10 (11), pp. 1418–1429, 2017.
  • Kamiński et al. [2019] B. Kamiński, V. Poulin, P. Prałat, P. Szufel, and F. Théberge. Clustering via hypergraph modularity. PLoS ONE, 14 (11), p. e0224307, 2019.
  • Kamiński et al. [2020] B. Kamiński, P. Prałat, and F. Théberge. Community detection algorithm using hypergraph modularity. In International Conference on Complex Networks and Their Applications, pp. 152–163. 2020.
  • Karrer and Newman [2011] B. Karrer and M. E. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83 (1), p. 016107, 2011.
  • Karypis et al. [1999] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7 (1), pp. 69–79, 1999. doi:10.1109/92.748202.
  • Karypis and Kumar [2000] G. Karypis and V. Kumar. Multilevel k-way hypergraph partitioning. VLSI design, 11 (3), pp. 285–300, 2000.
  • Ke et al. [2019] Z. T. Ke, F. Shi, and D. Xia. Community detection for hypergraph networks via regularized tensor power iteration. arXiv preprint arXiv:1909.06503, 2019.
  • Kim et al. [2018] C. Kim, A. S. Bandeira, and M. X. Goemans. Stochastic block model for hypergraphs: Statistical limits and a semidefinite programming approach. arXiv:1807.02884, 2018.
  • Kumar et al. [2020] T. Kumar, S. Vaidyanathan, H. Ananthapadmanabhan, S. Parthasarathy, and B. Ravindran. Hypergraph clustering by iteratively reweighted modularity maximization. Applied Network Science, 5 (1), pp. 1–22, 2020.
  • Lambiotte et al. [2019] R. Lambiotte, M. Rosvall, and I. Scholtes. From networks to optimal higher-order models of complex systems. Nature Physics, 15 (4), pp. 313–320, 2019.
  • Larremore et al. [2014] D. B. Larremore, A. Clauset, and A. Z. Jacobs. Efficiently inferring community structure in bipartite networks. Physical Review E, 90 (1), p. 012805, 2014.
  • Li and Milenkovic [2017] P. Li and O. Milenkovic. Inhomogeneous hypergraph clustering with applications. In Advances in Neural Information Processing Systems, pp. 2308–2318. 2017.
  • Li and Milenkovic [2018] ———.

    Submodular hypergraphs: P-laplacians, cheeger inequalities and spectral clustering

    .
    In 35th International Conference on Machine Learning, ICML 2018, pp. 4690–4719. 2018.
  • Li et al. [2019] P. Li, G. J. Puleo, and O. Milenkovic. Motif and hypergraph correlation clustering. IEEE Transactions on Information Theory, 2019.
  • Li and Solé [1996] W.-C. W. Li and P. Solé. Spectra of regular graphs and hypergraphs and orthogonal polynomials. European Journal of Combinatorics, 17 (5), pp. 461 – 477, 1996. doi:10.1006/eujc.1996.0040.
  • Liu and Murata [2010] X. Liu and T. Murata. An efficient algorithm for optimizing bipartite modularity in bipartite networks. Journal of Advanced Computational Intelligence and Intelligent Informatics, 14 (4), pp. 408–415, 2010.
  • Louis [2015] A. Louis. Hypergraph markov operators, eigenvalues and approximation algorithms. In

    Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing

    , pp. 713–722. 2015.
    doi:10.1145/2746539.2746555.
  • Mastrandrea et al. [2015] R. Mastrandrea, J. Fournet, and A. Barrat. Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PloS one, 10 (9), p. e0136497, 2015.
  • Milo [2002] R. Milo. Network motifs: Simple building blocks of complex networks. Science, 298 (5594), pp. 824–827, 2002. doi:10.1126/science.298.5594.824.
  • Murata [2009a] T. Murata. Detecting communities from bipartite networks based on bipartite modularities. In 2009 International Conference on Computational Science and Engineering, pp. 50–57. 2009a.
  • Murata [2009b] ———. Modularities for bipartite networks. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, pp. 245–250. 2009b.
  • Nadakuditi and Newman [2012] R. R. Nadakuditi and M. E. Newman. Graph spectra and the detectability of community structure in networks. Physical Review Letters, 108 (18), p. 188701, 2012.
  • Neubauer and Obermayer [2009] N. Neubauer and K. Obermayer. Towards community detection in k-partite k-uniform hypergraphs. In Proceedings of the 2009 Workshop on Analyzing Networks and Learning with Graphs, pp. 1–9. 2009.
  • Newman [2006] M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103 (23), pp. 8577–8582, 2006.
  • Newman [2016] ———. Equivalence between modularity optimization and maximum likelihood methods for community detection. Physical Review E, 94 (5), p. 052315, 2016.
  • Newman [2010] M. E. J. Newman. Networks: An Introduction, Oxford University Press, 2010.
  • Newman and Girvan [2004] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69, p. 026113, 2004. doi:10.1103/PhysRevE.69.026113.
  • Nowicki and Snijders [2001] K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96 (455), pp. 1077–1087, 2001.
  • Peel et al. [2017] L. Peel, D. B. Larremore, and A. Clauset. The ground truth about metadata and community detection in networks. Science Advances, 3 (5), p. e1602548, 2017.
  • Peixoto [2014] T. P. Peixoto. Hierarchical block structures and high-resolution model selection in large networks. Physical Review X, 4 (1), p. 011047, 2014.
  • Peixoto [2019] ———. Bayesian stochastic blockmodeling. Advances in Network Clustering and Blockmodeling, pp. 289–332, 2019.
  • Porter et al. [2009] M. A. Porter, J.-P. Onnela, and P. J. Mucha. Communities in networks. Notices of the AMS, 56 (9), pp. 1082–1097, 2009.
  • Reichardt and Bornholdt [2006] J. Reichardt and S. Bornholdt. Statistical mechanics of community detection. Physical Review E, 74 (1), p. 016110, 2006.
  • Rodríguez [2003] J. Rodríguez. On the laplacian spectrum and walk-regular hypergraphs. Linear and Multilinear Algebra, 51 (3), pp. 285–297, 2003. doi:10.1080/0308108031000084374.
  • Sahasrabuddhe et al. [2020] R. Sahasrabuddhe, L. Neuhäuser, and R. Lambiotte. Modelling non-linear consensus dynamics on hypergraphs. Journal of Physics: Complexity, 2020.
  • Schlag et al. [2016] S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders, and C. Schulz. k-way hypergraph partitioning via n-level recursive bisection. In 18th Workshop on Algorithm Engineering and Experiments, (ALENEX 2016), pp. 53–67. 2016.
  • Schwarz [1978] G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6 (2), pp. 461–464, 1978.
  • Stehlé et al. [2011] J. Stehlé, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J.-F. Pinton, M. Quaggiotto, W. V. den Broeck, C. Régis, B. Lina, and P. Vanhems. High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE, 6 (8), p. e23176, 2011. doi:10.1371/journal.pone.0023176.
  • Stewart and Woon [2021] C. Stewart and J. Woon. Congressional committee assignments, 103rd to 115th congresses, 1993–2017. 2021.
  • Tian et al. [2009] Z. Tian, T. Hwang, and R. Kuang. A hypergraph-based learning algorithm for classifying gene expression and arrayCGH data with prior knowledge. Bioinformatics, 25 (21), pp. 2831–2838, 2009. doi:10.1093/bioinformatics/btp467.
  • Torres et al. [2020] L. Torres, A. S. Blevins, D. S. Bassett, and T. Eliassi-Rad. The why, how, and when of representations for complex systems. arXiv:2006.02870, 2020.
  • Traag et al. [2019] V. A. Traag, L. Waltman, and N. J. van Eck. From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9 (1), pp. 1–12, 2019.
  • Tsourakakis et al. [2017] C. E. Tsourakakis, J. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. In Proceedings of the 26th International Conference on World Wide Web, pp. 1451–1460. 2017.
  • Veldt et al. [2020a] N. Veldt, A. R. Benson, and J. Kleinberg. Hypergraph cuts with general splitting functions. arXiv:2001.02817, 2020a.
  • Veldt et al. [2020b] ———. Minimizing localized ratio cut objectives in hypergraphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1708–1718. 2020b.
  • Veldt et al. [2018] N. Veldt, D. F. Gleich, and A. Wirth. A correlation clustering framework for community detection. In Proceedings of the 2018 World Wide Web Conference, p. 439â448. 2018. doi:10.1145/3178876.3186110.
  • Veldt et al. [2020c] N. Veldt, A. Wirth, and D. F. Gleich. Parameterized correlation clustering in hypergraphs and bipartite graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1868–1876. 2020c.
  • Weir et al. [2017] W. H. Weir, S. Emmons, R. Gibson, D. Taylor, and P. J. Mucha. Post-processing partitions to identify domains of modularity optimization. Algorithms, 10 (3), p. 93, 2017.
  • Yadati et al. [2019] N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar. HyperGCN: A new method for training graph convolutional networks on hypergraphs. In Advances in Neural Information Processing Systems, pp. 1511–1522. 2019.
  • Yang and Leskovec [2013] J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596. 2013.
  • Yen and Larremore [2020] T.-C. Yen and D. B. Larremore. Community detection in bipartite networks with stochastic blockmodels. arXiv:2001.11818, 2020.
  • Yoshida [2019] Y. Yoshida. Cheeger Inequalities for Submodular Transformations, pp. 2582–2601. Society for Industrial and Applied Mathematics, 2019. doi:10.1137/1.9781611975482.160.
  • Zhang and Moore [2014] P. Zhang and C. Moore. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proceedings of the National Academy of Sciences, 111 (51), pp. 18144–18149, 2014.
  • Zhou et al. [2006] D. Zhou, J. Huang, and B. Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. In Proceedings of the 19th International Conference on Neural Information Processing Systems, pp. 1601–1608. 2006.

Appendix A Proof of (6)

We prove the identity by direct calculation. We have

as was to be shown.

Appendix B Maximum-Likelihood Estimation of

Inserting (6) into the definition of yields

Focusing on the set on which takes constant value , we write

where captures terms that do not depend on . The first-order condition reads

Solving for the constant yields (8).

Appendix C Derivation of (9)

We have