1 Introduction
A large number of random graph models have been proposed (Nowicki and Snijders, 2001; Hoff et al., 2002; Handcock et al., 2007; Latouche et al., 2011) to describe complex interactions among objects of interest. Pairwise relationships among objects can be naturally represented as a graph, in which the objects are represented by the vertices, and two vertices are joined by an edge if certain relationship exists between them. While graphs are capable of representing pairwise interaction between objects, they are inadequate to represent higher order and unary interactions that are typically observed in many realworld problems. Examples of higherorder and unary relationships include coauthorship on academic papers, coappearance in movie scenes, and songs performed in a concert.
For example, the study of coauthorship networks of scientists have attracted significant research interests in both natural and social sciences (Newman, 2001a, b, 2004; Moody, 2004; Azondekon et al., 2018). Such networks are typically constructed by connecting two scientists if they have coauthored one or more papers together. However, as we will illustrate below, such representation inevitably results in loss of information while a hypergraph representation naturally preserves all information. A hypergraph is a generalization of a graph in which hyperedges are arbitrary sets of vertices, and can contain any number of vertices. As a result, hypergraphs are capable of representing relationships of any arbitrary orders.
We consider a simple example of a coauthorship network with 7 authors and 4 papers in order to illustrate the benefits of hypergraph modelling. A hypergraph representation of the network is given in Figure 1 where the vertices represent the authors while the hyperedges represent the papers. For example, the paper is written by four authors , and , and the paper is written by two authors and , while the paper has a single author .
On the other hand, a graph representation of this coauthorship network with edges between any two authors who have coauthored at least one paper results in the edge set . It is evident that much information is lost with this representation. In particular, this representation removes information about the number of authors that coauthored a paper. For example, one can only deduce from this edge set that has coauthored with and while unable to conclude that the coauthorship was for the same paper. Furthermore, the hyperedge which contains a singleton is left out in the graph representation.
A number of random hypergraph models were studied in probability and combinatorics literature where theoretical properties such as phase transition, chromatic number were investigated
(Karoński and Łuczak, 2002; Goldschmidt, 2005; de Panafieu, 2015; Dyer et al., 2015; Poole, 2015). A novel parametrization of distributions on hypergraphs based on geometry of points is proposed in Lunagómez et al. (2017) which is used to infer Markov structure for multivariate distributions.On the other hand, statistical modeling with random hypergraph is less explored. Stasi et al. (2014) introduced the hypergraph beta model with three variants, which is a natural extension of the beta model for random graphs (Holland and Leinhardt, 1981). In their model, the probability of a hyperedge
appearing in the hypergraph is parameterized by a vector
, which represents the “attractiveness” of each vertex. However, their model does not capture clustering among objects which is a typical real world phenomenon. In addition, the assumption of an upper bound on the size of hyperedges violates many real world data sets.One may equivalently represent a hypergraph using a bipartite network (also called twomode network and affiliation network). Twomode networks consist of two different kinds of vertices and edges can only be observed between the two types of vertices, but not between vertices of the same type. A hypergraph can be represented as a twomode by considering the hyperedges as a second type of vertices. For example, an equivalent bipartite representation of the hypergraph shown in Figure 1 is provided in Figure 2 where the hyperedges are now replaced by the four green vertices.
Twomode networks have been studied in various disciplines including computer science (Perugini et al., 2004), social sciences (Faust et al., 2002; Robins and Alexander, 2004; Koskinen and Edling, 2012; Friel et al., 2016) and physics (Lind et al., 2005). A number of approaches have been proposed to analyze and model twomode network data (Borgatti and Everett, 1997; Robins and Alexander, 2004; Doreian and Batagelj, 2004; Latapy et al., 2008; Wang et al., 2009; Snijders et al., 2013). In particular, models originally developed for binary networks were extended for twomode networks.
Doreian and Batagelj (2004) developed a blockmodeling approach of twomode network data which aims to simultaneously partition the two types of vertices into blocks. Skvoretz and Faust (1999)
proposed exponential random graph models (ERGMs) for twomode networks which models the logit of the probability of an actor belong to an event as a function of actor and event specific effects and other graph statistics. A clustering algorithm for twomode network is developed in
Field et al. (2006) based on the modelling framework in Skvoretz and Faust (1999). Several extensions to the ERGMs for bipartite networks are proposed in recent years (Wang et al., 2009, 2013). Snijders et al. (2013) proposed a methodology for studying the coevolution of twomode and onemode networks. A network autocorrelation model for twomode networks is introduced in Fujimoto et al. (2011).Representing network observations using twomode networks has the benefits of modelling vertices of both types jointly. However, in analyzing a twomode network, one type of vertices may attract most interest. For example, in coauthorship networks, the main interest may lie in the collaborations rather than in coauthored papers. In such scenarios, a hypergraph representation is most nature by converting one type of vertices into hyperedges with no loss of information.
In this paper, we propose the Extended Latent Class Analysis (ELCA) model for random hypergraphs, which is a natural extension of the Latent Class Analysis (LCA) model (Lazarsfeld and Henry, 1968; Goodman, 1974; Celeux and Govaert, 1991) and includes the LCA model as a special case. The model is applied to two applications, including Star Wars movie scenes and Lady Gaga concerts 2014.
2 Model and Motivation
2.1 Hypergraph
A hypergraph is represented by a pair , where is the set of vertices and is the set of hyperedges. A hyperedge is a subset of , and we allow repetitions in the hyperedge set . Thus, the hypergraph can alternatively be represented with a matrix where if vertex appears in hyperedge and otherwise.
2.2 Latent Class Analysis Model for Random Hypergraphs
The binary latent class analysis (LCA) model (Lazarsfeld and Henry, 1968; Goodman, 1974) is a commonly used mixture model for high dimensional binary data. It assumes that each observation is a member of one and only one of the latent classes, and conditional on the latent class membership, the manifest variables are mutually independent of each other. The LCA model appears to be a natural candidate to model random hypergraphs where hyperedges are partitioned into latent classes, and the probability that a hyperedge contains a vertex depends only on its latent class assignment.
Let be the a priori latent class assignment probabilities where is the number of latent classes, and define the matrix and is the probability that vertex is contained in a hyperedge with latent class label . The likelihood function can be written as
By introducing the latent class membership matrix where if hyperedge has latent class label and otherwise, the complete data likelihood of and can be expressed as (1).
(1) 
In comparison to the hypergraph beta models introduced in Stasi et al. (2014), the LCA model is capable of capturing the clustering and heterogeneity of hyperedges. For example, academic papers can be naturally labelled according to subject areas and conditional on a paper being labelled mathematics, one would expect that the probability a mathematician coauthored the paper is higher than a biologist. The LCA model does not assume an upper bound on the size of hyperedges and can model hyperedges of any size. Furthermore, an efficient expectation maximization algorithm (Dempster et al., 1977) can be easily derived to perform parameter estimation.
2.3 Extended Latent Class Analysis for Random Hypergraphs
While the LCA model captures the clustering and heterogeneity of hyperedges in real world data sets, it is quite restrictive in modeling the size of a hyperedge. The size of a hyperedge with latent class label
follows the Poisson Binomial distribution
(Wang, 1993) with parameters , and with expected valueand variance
. As we will illustrate in a few real world data sets, the LCA model underestimates the variation in sizes of hyperedges. Thus, we extend the LCA model by including an additional clustering structure to address this shortcoming.We develop the Extended Latent Class Analysis model (ELCA) by introducing an additional clustering to the hyperedges. We assume that the two clustering are independent. We let be the a priori additional clustering assignment probabilities where is the number of additional clusters. Thus, the probability that a hyperedge has cluster label and additional cluster label is given by . We define the matrix and dimensional vector so that the probability that vertex is contained in a hyperedge with cluster label and additional cluster label is given by .
Let denote the model parameters, the likelihood function can be written as
We define the additional cluster membership matrix where if hyperedge has additional cluster label and otherwise. The complete data likelihood function of , and is given as:
(2) 
We further impose the constraint to ensure that the model is identifiable. It is easy to see that the LCA model is a special case of the ELCA model by letting the number of additional clusters .
2.4 Theoretical Properties
We compare the theoretical properties of the LCA and ELCA models developed above. Proposition 2.1 below shows that the size of hyperedge simulated from the ELCA model has larger variance than simulated from the LCA model.
Proposition 2.1.
Suppose we are given the LCA model with parameters and the ELCA model with parameters and vertices. Suppose the condition holds for and .
Let denote the size of a random hyperedge generated under the LCA model. Similarly, let denote the size of a random hyperedge generated under the ELCA model. We have the following results.
Proof.
The proof is straightforward and is given in the Appendix. ∎
We now let be the probability mass functions of the size of a random hyperedge simulated from a cluster LCA model. Similarly, we let be the probability mass function of the size of a random hyperedge simulated from the ELCA model with clusters and additional clusters. The following result can be derived.
Proposition 2.2.

Under the specifications of a LCA model with parameters and , and suppose the following conditions hold for ,
as . We have
That is, the distribution of the size of a random hyperedge converges to a mixture of Poisson distribution with
components. 
Under the specification of a ELCA model with parameters , , , and . Further suppose the following conditions hold for , and .
as . We have
That is, the distribution of the size of a random hyperedge converges to a mixture of Poisson distribution with components.
Proof.
Conditional on the event that a random hyperedge is generated from cluster , (Wang, 1993, Theorem 3) implies that
Part 1 result follows by marginalizing over the clusters. The second part of the proposition can be proved similarly. ∎
Proposition 2.2 implies that the size distribution of a random hyperedge generated under the ELCA model is far more flexible than for the LCA model.
2.5 CoClustering
The concept of having two clustering structure is related to coclustering or block clustering. In coclustering, the objective is to simultaneously cluster rows and columns of a data matrix. In particular, mixture models have been proposed with EM algorithms developed in the context of coclustering (Govaert and Nadif, 2003, 2008). Coclustering has also received significant attention in various application such as text mining, bioinformatics and recommender systems (Dhillon et al., 2003; Cheng and Church, 2000; George and Merugu, 2005). In comparison, we aim to obtain two types of clustering structure for the rows of a data matrix.
In the work of Rau et al. (2015), a Poisson mixture model was proposed for clustering of digital gene expression to discover groups of coexpressed genes, where observations of biological entities under different conditions are collected. In order to model the variations in overall expression level among biological entities, a scaling parameter is introduced for each entity. In comparison, we explicitly model the size of random hyperedge using clustering which results in a more parsimonious model structure.
3 EM Algorithm
We estimate the parameters of the ELCA model using an EM algorithm (Dempster et al., 1977) which is a popular method in fitting mixture models. The Estep of the EM algorithm involves computing the expected value of the logarithm of the complete data likelihood (2) with respect to the distribution of the unobserved and given the current estimates. The Mstep involves maximizing the expected complete data loglikelihood.
Taking logarithm of the complete data likelihood in (2), we obtain the complete data loglikelihood function below.
(3) 
For the Estep, we need to evaluate the expectation of (3) conditional on data and current parameter estimates .
That is, we need to evaluate the expectation . We have that
(4)  
In particular, the Estep has a computational complexity of for each pair . While the Estep of the EM algorithm is straightforward, the Mstep involves complicated maximization. Thus, we use the ECM algorithm (Meng and Rubin, 1993) which replaces the complex Mstep by a series of simpler conditional maximizations. The conditional maximizations with respect to the parameters and do not have closed form solutions. We resort to the MM algorithm (Lange et al., 2000; Hunter and Lange, 2004) which works by lower bounding the objective function by a minorizing function and then maximizing the minorizing function. Details of the Mstep are given in the appendix and the EM algorithm is summarized in Algorithm 1. In particular, we note that the computational complexity for maximizing and are given by and , respectively, where is the number of iterations required for the MM algorithm.
4 Model Selection
4.1 Cross Validated Likelihood
Given a fixed model, the cross validated likelihood method (Smyth, 2000) works by repetitively partitioning the observations into two disjoint sets, one of which is used to fit the model and obtain estimates of model parameters by maximizing the loglikelihood, and the other is for evaluating the model by computing its loglikelihood.
For each and , we define to be the ELCA model with clusters and additional clusters. To apply the cross validated likelihood method, we randomly partition the hyperedges into two sets and where each hyperedge in is included in with probability . In our applications we set . The EM algorithm developed in section 3 is then used to fit and obtain the parameter estimates . We then compute the loglikelihood of under the estimated parameters and obtain the test loglikelihood . The above procedure is then repeated times and the estimated cross validated loglikelihood is obtained by averaging over . The procedure above is summarized in Algorithm 2.
We perform a greedy search for the optimal combination of and which produces the largest estimated cross validated loglikelihood . Starting with one cluster and one additional cluster, an additional cluster is then successively added to the model until the estimated cross validated loglikelihood does not increase. At this stage, we then increment the number of clusters by 1 and the above procedure is repeated provided that . The greedy search algorithm is summarized in Algorithm 3. The greedy search can be computationally intensive when the search space for and the number of cross validation are large.
5 Applications
5.1 Star Wars Movie Scenes
Our first application is modeling coappearance of the main characters in the scenes of the movie “Star Wars: A New Hope”. We collected the scripts of the movie from The Internet Movie Script Database ^{3}^{3}3Movie script data freely available at https://www.imsdb.com/ and constructed a hypergraph for the eight main characters. We define each scene in the movie as a hyperedge with a total of 178 hyperedges, and a character is contained in the scene if he/she speaks in the scene.
We first performed model selection using the greedy search algorithm and the cross validated likelihood method presented in Section 4.1 to select the optimal number of clusters and additional clusters for the ELCA model. The results of the greedy search are provided in Table A1 and the model with 3 clusters and 2 additional clusters is selected.
The results from fitting the ELCA model with and are provided in Table 1 and Table 2. We can see the variation in the size of hyperedges from the parameter estimates and with the majority () of hyperedges having size much smaller than the rest of the hyperedges. Thus, one can deduce that a small proportion of the movie scenes have far more characters.
Character  Cluster 1  Cluster 2  Cluster 3 

Wedge  0.18  0.00  0.36 
Han  0.00  1.00  0.00 
Luke  1.00  1.00  0.00 
C3PO  0.75  0.30  0.00 
ObiWan  0.00  0.00  1.00 
Leia  0.12  0.48  0.07 
Biggs  0.31  0.00  0.28 
Darth Vader  0.19  0.35  0.06 
The estimates in Table 2 reveal interesting clustering structure for the 8 main characters in the movie. For example, the lead character “Luke” has a strong tendency to appear in the two largest clusters. On the other hand, it is extremely unlikely for “ObiWan” and “Han” appear in the same scene.
The estimated cluster assignment probabilities from the EM algorithm for each movie scene in the Star Wars movie are shown in chronological order in Figure 3. We can see from the plot that scenes in the early part of the movie are mainly associated with cluster 1, while cluster 2 contains most of the scenes from roughly scene 40 to scene 100. We can deduce from this, for example, that the character “Han” is very active in the middle part of the movie. On the other hand, there does not appear to be any obvious pattern for the third cluster. The clustering for many early and late movie scenes is relatively uncertain, as shown in the plot.
The uncertainties in clustering are also illustrated in a ternary plot in Figure 4. Each dot in the plot represents a movie scene, and the three corners of the plot represent the three clusters. The closer the dot is to the corner, the higher probability that the corresponding movie scene belongs to the corresponding cluster. The ternary plot in Figure 4 shows significant uncertainties in clustering a number of movie scenes into the first two clusters. This is reasonable since for a number of actors including the lead actor “Luke”, the probabilities of scene appearance are similar for the first two clusters.
5.2 Lady Gaga Concerts 2014
As a second application of the ELCA model, we collected the list of songs that Lady Gaga performed in all concerts in 2014 ^{4}^{4}4 The Lady Gaga setlist data are available at: http://www.setlist.fm/ . The data set contains 96 concerts with a total of 51 distinct songs performed. The hypergraph is constructed by defining each concert as a hyperedge and each song as a vertex. A vertex is contained in a hyperedge if the corresponding song is performed in the corresponding concert. The results of performing model selection using the approach of Section 4.1 is presented in Table A2. We can see from Table A2 that ELCA models with more than one additional cluster significantly outperform standard latent class analysis models.
The model with 5 clusters and 2 additional clusters was chosen and fitted to the data set. The parameter estimates , and are given in Table 3. We can deduce from and that there are a small number of very short concerts of length approximately 14% of the rest of the “full” concerts.
Table 4 shows the parameter estimates where the popularity of the 51 songs across 5 clusters are shown. One can see a small number of extremely popular songs which tend to be performed in most concerts, such as “Paparazzi”, “Bad Romance”, “Born This Way”, “G.U.Y” and “Just Dance”. Among the least performed songs, “Fashion!” and “Cake Like Lady Gaga” tend to be performed in the same concert, while “Lush Life”, “It Don’t Mean a Thing (If It Ain’t Got That Swing)” and “But Beautiful” are more likely to be performed in the same concert.
Songs  Cluster 1  Cluster 2  Cluster 3  Cluster 4  Cluster 5 
Monster for Life  0.04  0.00  0.00  0.00  0.00 
Fashion!  0.00  0.00  0.68  0.00  0.00 
Paparazzi  1.00  1.00  1.00  1.00  1.00 
Bad Romance  1.00  1.00  1.00  1.00  1.00 
What’s Up  0.12  0.00  0.00  0.00  0.44 
Sophisticated Lady  0.00  0.00  0.00  0.00  0.31 
Dance in the Dark  0.00  0.00  0.00  0.00  0.44 
Born This Way  1.00  1.00  1.00  1.00  1.00 
Judas  1.00  0.87  0.00  0.00  0.00 
Partynauseous  1.00  1.00  1.00  0.00  1.00 
Yo and I  1.00  0.13  0.00  1.00  1.00 
I Will Always Love You  0.00  0.03  0.00  0.00  0.00 
Monster  0.00  0.00  0.00  1.00  0.00 
Bang Bang (My Baby Shot Me Down)  0.72  0.00  0.00  0.00  1.00 
The Queen  0.00  0.00  0.09  0.00  0.00 
Dope  1.00  0.60  0.00  1.00  1.00 
Jewels N’ Drugs  1.00  1.00  1.00  0.00  1.00 
Hair  0.00  0.00  0.05  0.00  0.00 
Mary Jane Holland  1.00  0.00  0.95  0.00  1.00 
G.U.Y.  1.00  1.00  1.00  1.00  1.00 
MANiCURE  1.00  1.00  1.00  0.00  1.00 
Lush Life  0.00  0.00  0.00  0.10  0.51 
It Don’t Mean a Thing (If It Ain’t Got That Swing)  0.00  0.00  0.00  0.00  0.49 
You’ve Got a Friend  0.00  0.00  0.00  0.12  0.00 
The Edge of Glory  1.00  1.00  0.04  0.00  1.00 
Donatella  1.00  1.00  1.00  0.00  1.00 
But Beautiful  0.00  0.00  0.00  0.00  0.31 
Do What U Want  1.00  1.00  1.00  0.00  1.00 
Gypsy  1.00  1.00  1.00  0.00  1.00 
Applause  1.00  1.00  1.00  1.00  1.00 
Marry the Night  0.04  0.00  0.05  0.00  0.44 
Sexxx Dreams  1.00  0.97  1.00  1.00  1.00 
Another One Bites the Dust  0.00  0.00  0.00  0.12  0.00 
Just Dance  1.00  1.00  1.00  1.00  1.00 
Cake Like Lady Gaga  0.00  0.00  0.68  0.00  0.00 
I Can’t Give You Anything but Love, Baby  0.00  0.03  0.00  0.00  0.49 
Black Jesus Amen Fashion  0.00  0.00  0.00  1.00  0.00 
Ratchet  1.00  1.00  1.00  0.00  1.00 
Aura  1.00  0.87  0.91  0.00  0.00 
Poker Face  1.00  1.00  1.00  1.00  1.00 
Venus  1.00  1.00  1.00  0.00  1.00 
If I Ever Lose My Faith in You  0.00  0.00  0.00  0.12  0.00 
Bell Bottom Blues  0.04  0.00  0.00  0.00  0.00 
Whole Lotta Love  0.04  0.00  0.00  0.00  0.00 
Telephone  1.00  1.00  1.00  0.00  1.00 
Willkommen  0.16  0.00  0.00  0.00  0.00 
Brooklyn Nights  0.00  0.03  0.00  0.00  0.00 
Alejandro  1.00  1.00  1.00  0.00  1.00 
ARTPOP  1.00  1.00  1.00  1.00  1.00 
I’ve Got a Crush on You  0.00  0.00  0.05  0.00  0.00 
Swine  1.00  0.97  1.00  0.00  1.00 
The estimated cluster assignment probabilities for each concert performed by Lady Gaga in 2014 are shown in chronological order in Figure 5. There is a strong association between clusters and time of the year. For example, the first 30 concerts performed in 2014 are mainly associated with cluster 1 where songs such as “Bad Romance”, “Judas” and “Aura” are among the most popular ones. On the other hand, the next 30 concerts are strongly associated with cluster 2 where songs such as “The Edge of Glory”, “Venus” and “Ratchet” are popular. The last 10 concerts of 2014 are mostly clustered into cluster 4 where songs such as “Yo and I”, “Monster” and “Black Jesus Amen Fashion” are frequently performed.
Figure 6 shows the distribution of hyperedge sizes (or the number of songs performed in concerts) along with the estimated hyperedge sizes by the ELCA model with and , and the LCA model with 5 clusters. Adding an additional cluster to the model significantly improves the fit, especially on the tails of the distribution.
6 Conclusion
In this paper, we have proposed the Extended Latent Class Analysis model as a generative model for random hypergraphs. The model introduces two clustering structures for hyperedges which captures variation in sizes of hyperedges.
An EM algorithm has developed for model fitting where the Mstep is implemented using a MM algorithm. Model selection is performed using cross validated likelihood method to account for the small sample sizes relative to the number of vertices.
The model has been shown to give an improved fit relative to the Latent Class Analysis model for three illustrative examples. Furthermore, the fitted model reveals interesting and interpretable structure within the vertices and hyperedges.
References
 Azondekon et al. (2018) Azondekon, R., Harper, Z. J., Agossa, F. R., Welzig, C. M., and McRoy, S. (2018), “Scientific authorship and collaboration network analysis on malaria research in Benin: Papers indexed in the Web of Science (1996–2016),” Global Health Research and Policy, 3, 11.
 Borgatti and Everett (1997) Borgatti, S. P. and Everett, M. G. (1997), “Network analysis of 2mode data,” Soc. Networks, 19, 243 – 269.
 Celeux and Govaert (1991) Celeux, G. and Govaert, G. (1991), “Clustering criteria for discrete data and latent class models,” Journal of Classification, 8, 157–176.
 Cheng and Church (2000) Cheng, Y. and Church, G. M. (2000), “Biclustering of expression data,” in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, pp. 93–103.
 de Panafieu (2015) de Panafieu, É. (2015), “Phase transition of random nonuniform hypergraphs,” J. Discrete Algorithms, 31, 26–39.
 Dempster et al. (1977) Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist. Soc. Ser. B, 39, 1–38, with discussion.
 Dhillon et al. (2003) Dhillon, I. S., Mallela, S., and Modha, D. S. (2003), “Informationtheoretic coclustering,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98.
 Doreian and Batagelj (2004) Doreian, P. and Batagelj, V. (2004), “Generalized blockmodeling of twomode network data,” Soc. Networks, 29–53.
 Dyer et al. (2015) Dyer, M., Frieze, A., and Greenhill, C. (2015), “On the chromatic number of a random hypergraph,” J. Combin. Theory Ser. B, 113, 68–122.
 Faust et al. (2002) Faust, K., Willert, K., Rowlee, D., and Skvoretz, J. (2002), “Scaling and statistical models for affiliation networks: patterns of participation among Soviet politicians during the Brezhnev era,” Soc. Networks, 24, 231–259.
 Field et al. (2006) Field, S., Frank, K. A., Schiller, K., RiegleCrumb, C., and Muller, C. (2006), “Identifying positions from affiliation networks: Preserving the duality of people and events,” Soc. Networks, 28, 97 – 123.
 Friel et al. (2016) Friel, N., Rastelli, R., Wyse, J., and Raftery, A. E. (2016), “Interlocking directorates in Irish companies using a latent space model for bipartite networks,” Proc. Natl. Acad. Sci. U.S.A., 113, 6629–6634.
 Fujimoto et al. (2011) Fujimoto, K., Chou, C.P., and Valente, T. W. (2011), “The network autocorrelation model using twomode data: Affiliation exposure and potential bias in the autocorrelation parameter,” Soc. Networks, 33, 231 – 243.
 George and Merugu (2005) George, T. and Merugu, S. (2005), “A scalable collaborative filtering framework based on coclustering,” in Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 625–628.
 Goldschmidt (2005) Goldschmidt, C. (2005), “Critical random hypergraphs: the emergence of a giant set of identifiable vertices,” Ann. Probab., 33, 1573–1600.
 Goodman (1974) Goodman, L. A. (1974), “Exploratory latent structure analysis using both identifiable and unidentifiable models,” Biometrika, 61, 215–231.
 Govaert and Nadif (2003) Govaert, G. and Nadif, M. (2003), “Clustering with block mixture models.” Pattern Recognition, 36, 463–473.
 Govaert and Nadif (2008) — (2008), “Block clustering with Bernoulli mixture models: comparison of different approaches,” Comput. Statist. Data Anal., 52, 3233–3245.
 Handcock et al. (2007) Handcock, M. S., Raftery, A. E., and Tantrum, J. M. (2007), “Modelbased clustering for social networks,” J. Roy. Statist. Soc. Ser. A, 170, 301–354.
 Hoff et al. (2002) Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002), “Latent space approaches to social network analysis,” J. Amer. Statist. Assoc., 97, 1090–1098.

Holland and Leinhardt (1981)
Holland, P. W. and Leinhardt, S. (1981), “An exponential family of probability distributions for directed graphs,”
J. Amer. Statist. Assoc., 76, 33–65.  Hunter and Lange (2004) Hunter, D. R. and Lange, K. (2004), “A tutorial on MM algorithms,” Amer. Statist., 58, 30–37.
 Karoński and Łuczak (2002) Karoński, M. and Łuczak, T. (2002), “The phase transition in a random hypergraph,” J. Comput. Appl. Math., 142, 125–135.
 Koskinen and Edling (2012) Koskinen, J. and Edling, C. (2012), “Modelling the evolution of a bipartite network  Peer referral in interlocking directorates,” Soc. Networks, 34, 309 – 322, dynamics of Social Networks (2).
 Lange et al. (2000) Lange, K., Hunter, D. R., and Yang, I. (2000), “Optimization transfer using surrogate objective functions,” J. Comput. Graph. Statist., 9, 1–59.
 Latapy et al. (2008) Latapy, M., Magnien, C., and Vecchio, N. D. (2008), “Basic notions for the analysis of large twomode networks,” Soc. Networks, 30, 31–48.
 Latouche et al. (2011) Latouche, P., Birmelé, E., and Ambroise, C. (2011), “Overlapping stochastic block models with application to the French political blogosphere,” Ann. Appl. Stat., 5, 309–336.
 Lazarsfeld and Henry (1968) Lazarsfeld, P. F. and Henry, N. W. (1968), Latent structure analysis, Houghton Mifflin, Boston, MA 02110, USA.
 Lind et al. (2005) Lind, P. G., González, M. C., and Herrmann, H. J. (2005), “Cycles and clustering in bipartite networks,” Phys. Rev. E, 72, 056127.
 Lunagómez et al. (2017) Lunagómez, S., Mukherjee, S., Wolpert, R. L., and Airoldi, E. M. (2017), “Geometric representations of random hypergraphs,” J. Amer. Stat. Assoc., 112, 363–383.
 Meng and Rubin (1993) Meng, X.L. and Rubin, D. B. (1993), “Maximum likelihood estimation via the ECM algorithm: a general framework,” Biometrika, 80, 267–278.
 Moody (2004) Moody, J. (2004), “The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999,” Am. Sociol. Rev, 69, 213–238.
 Newman (2004) Newman, M. E. (2004), Who Is the Best Connected Scientist? A Study of Scientific Coauthorship Networks, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 337–370.
 Newman (2001a) Newman, M. E. J. (2001a), “Scientific collaboration networks. I. Network construction and fundamental results,” Phys. Rev. E, 64, 016131.
 Newman (2001b) — (2001b), “Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality,” Phys. Rev. E, 64, 016132.
 Nowicki and Snijders (2001) Nowicki, K. and Snijders, T. A. B. (2001), “Estimation and prediction for stochastic blockstructures,” J. Amer. Statist. Assoc., 96, 1077–1087.
 Perugini et al. (2004) Perugini, S., Gonçalves, M. A., and Fox, E. A. (2004), “Recommender systems research: a connectioncentric survey,” Journal of Intelligent Information Systems, 23, 107–143.
 Poole (2015) Poole, D. (2015), “On the strength of connectedness of a random hypergraph,” Electron. J. Combin., 22, Paper 1.69, 16.
 Rau et al. (2015) Rau, A., MaugisRabusseau, C., MartinMagniette, M.L., and Celeux, G. (2015), “Coexpression analysis of highthroughput transcriptome sequencing data with Poisson mixture models,” Bioinformatics, 31, 1420.
 Robins and Alexander (2004) Robins, G. and Alexander, M. (2004), “Small worlds among interlocking directors: network structure and distance in bipartite graphs,” Comput. Math. Organ. Theory, 10, 69–94.
 Skvoretz and Faust (1999) Skvoretz, J. and Faust, K. (1999), “Logit models for affiliation networks,” Sociol. Methodol, 29, 253–280.
 Smyth (2000) Smyth, P. (2000), “Model selection for probabilistic clustering using crossvalidated likelihood,” Statistics and Computing, 10, 63–72.
 Snijders et al. (2013) Snijders, T. A., Lomi, A., and Torló, V. J. (2013), “A model for the multiplex dynamics of twomode and onemode networks, with an application to employment preference, friendship, and advice,” Soc. Networks, 35, 265–276.
 Stasi et al. (2014) Stasi, D., Sadeghi, K., Rinaldo, A., Petrovic, S., and Fienberg, S. (2014), “ models for random hypergraphs with a given degree sequence,” in Proceedings of COMPSTAT 2014—21st International Conference on Computational Statistics, pp. 593–600.
 Wang et al. (2013) Wang, P., Pattison, P., and Robins, G. (2013), “Exponential random graph model specifications for bipartite networks  A dependence hierarchy,” Soc. Networks, 35, 211–222.
 Wang et al. (2009) Wang, P., Sharpe, K., Robins, G., and Pattison, P. (2009), “Exponential random graph (p*) models for affiliation networks,” Soc. Networks, 31, 12–25.
 Wang (1993) Wang, Y. H. (1993), “On the number of successes in independent trials,” Statist. Sinica, 3, 295–312.
Appendix A Proof on Proposition 2.1
Proof.
We can write where if node appears in the hyperedge and otherwise. Similarly, we write . Let be the latent cluster assignment of where if is generated from cluster . Let and be the latent cluster and additional clusters assignments of , where and if is generated from cluster and additional clusters . We have
For the variance of the LCA model, we have that
where
Hence, we have that
Now,
We have
Now,
To show the quantity above is nonnegative, we have to show that
which follows from Jensen’s inequality. ∎
Appendix B Mstep of EM Algorithm
For the Mstep, we need to maximize with respect to the model parameters , , and .
b.1 Maximize w.r.t.
For fixed and , the objective function retaining terms involving can be written as
(5) 
Since an analytic expression for does not exist due to the term , we apply the MM (Minorization Maximization) algorithm (Hunter and Lange, 2004). We first apply a quadratic lower bound on the concave function for . We let
We then have
Comments
There are no comments yet.