1 Introduction
As statistical modeling of network data has been posited as a major topic of interest in diverse areas of study, there have been an increasing number of books [42, 57] and survey papers on random graphs and network models [12, 21, 73, 78, 83, 54]. Those existing surveys provide a comprehensive overview of the historical development of statistical network modeling, including the summary of network models that are not latent variable models—e.g., Exponential random graph models (ERGMs), the quadratic assignment procedure (QAP), stochastic actor oriented models (SAOMs)—as well as the latent variable models—e.g., statist latent space models (LSM) and the stochastic blockmodels (SBM). The key idea behind introducing latent variables into network analysis is to capture various forms of dependence between edges and get conditional independence in the error terms, which is one of the most challenging parts of network modeling. Considering the difference between latent variable models and the rest in the network literature, we aim to provide indepth information on the unobserved or unmeasured structure of networks by presenting a selective review on dynamic network models with latent variables. This area has undergone significant developments in recent years, with emphasis on two classes of models, the latent space models (LSM) and the stochastic blockmodels (SBM).
The latent space models (LSM) [35] assume that nodes are positioned in an dimensional latent space, and they tend to create edges to others that are closer in their latent positions. Due to this simple geometrybased assumption, the latent space models have an advantage in providing the useful visualization and interpretation of network or relational data, and thus have been widely used in numerous fields of study [10, 17, 25, 36, 43, 76, 77, 82].
The stochastic blockmodels (SBM) assume that the nodes of the network are partitioned into several unobserved (latent) classes (or blocks). The framework is first introduced by Holland et al. [37] which focus on the case of a priori
specified blocks, where the membership of nodes are known or assumed, and the goal is to estimate a matrix of edge probabilities. A statistical approach to
a posteriori block modeling for networks is introduced by [74] and [59], where the objective is to estimate the matrix of edge probabilities and the memberships simultaneously. Since then, the communities found in the stochastic blockmodels have been interpreted meaningfully in many research fields. For example, in citation and collaboration networks, such communities could be interpreted as scientific disciplines [39, 58], while the communities in food web networks could be interpreted as ecological subsystems [20].As dynamic network analysis—the study of networks that evolve over time— has become an emergent scientific field in the last decade, there has been a growing number of dynamic network models that incorporate latent space model or stochastic blockmodel framework. Since they are relatively new, to the best of our knowledge, there has not been any attempt to describe the progress in dynamic latent variable models. In this work, therefore, we outline some of the prominent dynamic latent space models and summarize their interconnections, and also describe the recent approaches in dynamic stochastic blockmodels.
We organize the review in the following way. For each class of models (LSM and SBM), we first introduce a family of static models as the background information, then describe the dynamic models motivated by the static ones, and discuss the applications to realworld data. In Section 5, we present two diagrams summarizing the relationships of all the models and mention some open problems and remaining challenges in dynamic network models with latent variables. We also provide a list of available data sources that have been used in past literature in the Appendix.
2 Notations
Throughout the paper, we consider the set of nodes labeled as , and assume the edges represented by an adjacency matrix , where being the edge value between and which could be binary, discrete, or continuous. To achieve the goal of statistical network models— to understand the dependence of the edges using the observed and unobserved structure—we define the observed pairspecific covariates as and the unobserved latent variables as . Specifically, represents an
dimensional latent position (or vector) of node
in the latent space models (Section 3), and represents the class of node in the stochastic blockmodels where each takes one value in the set of categories labeled as (Section 4). When extending to the dynamic models for discrete time points , we use a subscript on the variables (e.g., ) to denote “at time”. Generally, random variables are denoted by upper case letters and fixed quantities or realizations of random variables are denoted by lower case letters. Other than the ones specified here, additional notations specific to each model are introduced later.
3 Latent Space Models
In this section, we first describe the original latent space model introduced by Hoff et al. [35]. Then we introduce two lines of research: (i) the latent position model [23], which is built upon the Euclidean distance space, and (ii) the latent factor model [28], which stems from the projection model. We present the dynamic extension of these static models and demonstrate how the two different frameworks have evolved in the dynamic network literature, in chronological order.
3.1 Static Latent Space Models
3.1.1 Latent Space Approaches to Social Network Analysis
The idea of social space is first introduced to network modeling by Hoff et al. [35], with the intuition that each node can be seen as a point in an dimensional space that represents unobserved latent characteristics. The presence or absence of an edge between two nodes is independent of all other edges given the latent positions of the two nodes; thus the conditional probability of the adjacency matrix is
where are observed pairspecific covariates, are the set of regression parameters. Here, the latent positions can be interpreted as the random effects in linear models.
There exist two different ways to model : the distance model and the projection model. The main assumption here is that nodes are positioned in a latent space, and they tend to create edges to others with shorter distance (distance model) or narrower angle (projection model) between their latent positions. Both distance and projection models can be applied to various types of edges (e.g., binary, discrete, continuous) via the linked mean parameter of a generalized linear model. Without loss of generality, we introduce the model formulations for the case of binary networks.
Latent Distance Model
For a binary network, the distance model takes the form of a logistic regression model with
and defines the log odds of an edge between nodes
and as:where is an intercept term, is a vector of coefficients for covariate effects, and is the Euclidean distance between nodes in the latent space. In other words, the existence of an edge in the adjacency matrix (i.e., ) is determined
by the dyad attributes and the distance between the pair of nodes .
Krivitsky et al. [47] proposes the latent cluster random effects model that includes the distance between latent space positions, modelbased clustering of the latent positions, and random sender and receiver effects (discussed below), and introduces Bayesian estimation methods for both binary and nonbinary network data.
Latent Projection Model
The projection model posits a different assumption:
which changes the interpretation to such that and are more likely to form an edge if and are in the same direction (i.e., ), while less likely to form an edge if they are in opposite directions (i.e., ). Due to the denominator term , this also allows asymmetric edge probabilities (i.e., ) even when the dyadic covariates are symmetric.
3.1.2 Latent Position Cluster Model
As identifying groups of similar nodes is often of interest to network researchers, Handcock et al. [23] extends the latent space models to allow clustering of nodes in networks. Referred to as the “latent position cluster model”, this approach combines the latent space models and the modelbased clustering method [7]. The modeling framework is built upon the latent distance model
where each position
is drawn from a finite mixture of multivariate normal distributions from
groups. Assuming different means and covariance matrices for each group , the model proposeswhere is the probability that an actor belongs to , and is the identity matrix.
Inference of the latent space model and the clustering model can be made with either the maximum likelihood method (MLE) or the Bayesian approach via Monte Carlo Markov chain (MCMC), and those methods are implemented in the R package ‘latentnet’
[44, 46].3.1.3 Bilinear Mixedeffects Model
Providing some computational and conceptual advantages to the latent space models, Hoff [28] develops the generalized bilinear mixedeffects models (GBME), a network regression model that is in line with the latent space models, especially the latent projection models. Starting from the traditional regression model on dyadic data
this new approach includes additive and multiplicative random effects in the error terms, i.e.,
This representation allows us to explain the secondorder dependence often seen in dyadic data: a sender effect , a receiver effect , and a within dyad effect
. Variance parameters
and represents the dependence of observations having a common sender and receiver, respectively, with the correlation between sender and receiver effects described by . Moreover, reciprocity and withinactor correlation are measured by and the correlation parameter . Finally, the bilinear effect in the product term is known to capture the thirdorder dependence in dyadic data such as transitivity (i.e., if and , then we tend to have ), balance (i.e., for signed unordered edges), and clusterability (i.e., a triad can be divided into groups).The GBME framework has been further studied and modified in [29, 30, 55], and has been referred to as the “latent factor models”, the “eigenmodel”, and the “additive and multiplicative effects (AME) models” [27, 32, 55]. Throughout this paper, we use the name “latent factor models” to emphasize the objective of the paper in understanding latent variable models. [29] shows that the latent factor model generalizes the latent distance model and the latent class model. For the effective use of the latent factor models, the R package “AMEN” [32] provides estimation and inference for a class of AME models for ordinal, continuous, binary and other types of dyadic data.
3.2 Dynamic Latent Distance Models
In this section, we summarize some dynamic latent distance models which extend the latent distance models with Markovian properties.
3.2.1 Dynamic Social Network in Latent Space Model
The first dynamic latent distance model appeared in the machine learning literature by Sakar and Moore
[67]. They named it the “dynamic social network in latent space (DSNL) model”. This model assumes that (i.e., the observed pairwise adjacency matrix at time ) is only dependent on the current latent positions (“observational model”) while allowing the latent positions to move over time with standard Markovian assumptions (“transition model”).The observation model for the graph can be written as
where and denote the existence and absence of a link, respectively, and is the probability of a link between and at time . Specifically, it has the following form:
where is the Euclidean distance between and in the latent space as in the original latent distance model at time , is the maximum of the radii of and at time , defined as with being the degree of node at time , is a biquadratic kernel , and is a constant noise.
The transition model assumes a Gaussian random walk:
and the resulting log becomes
such that we want to estimate the positions using
For efficient optimization of the likelihood, the authors develop the twostage learning algorithms: (i) generalized multidimensional scaling (MDS) to find the initial latent positions across time and (ii) nonlinear conjugate gradient (CG) optimization starting from the initial estimates. Additionally, the Procrustean transformation is applied to the coordinates from MDS so that maintains the same orientation as .
The DSNL model is applied to NIPS conference paper coauthorship dataset (network size up to 11,000) by separating the 12 years of data into three discrete time points: 1987–1990, 1991–1994, and 1995–1998. The analysis is mainly focused on investigating the dynamics of some wellconnected researchers in the machine learning community, and the authors find out some noticeable changes in the network embeddings via the examination of latent positions. Sakar et al. [68] further extends the model to be applicable for dynamic bipartite or twomode networks, such as authorword data from the same NIPS data using text corpora.
3.2.2 Dynamic Latent Distance Model with Popularity and Activity Effects
Sewell and Chen [69] propose a network model for longitudinal networks which allows each node to have a temporal trajectory in an dimensional latent Euclidean space, with additional features to capture the nodes’ popularity and activity effects.
For a binary network , the model formulation relies on the logodds :
where the distance between and at time is defined as and is a vector of positive actorspecific parameters with constraints . Specifically, can be thought of as the ith actor’s social reach or radius which does not vary over time. In addition, this formulation separates the two parameters and to measure the global popularity and activity effects, respectively. Note that this model deals with directed networks (i.e., ) as well as undirected ones (i.e., ), while the DSNL model in Section 3.2.1 only allows undirected edges.
Similar to the DSNL model, the latent positions at time are modeled by a Markov process with a transition equation for :
where the initial positions have independent for and is the set of parameters of interest. To estimate the parameters , the authors adopt a Bayesan approach and implement MCMC algorithm using MetropolisHastings within Gibbs. They also perform a Procrustes transformation in order to reorient the sampled trajectories.
This dynamic latent distance model is applied to two datasets. The Dutch classroom dataset includes directed relationships among 25 students, surveyed over four time points. The bill cosponsorship dataset consists of the cosponsorship history of 644 members of Congress who served during the 97th to 101st Congresses in the U.S. House of Representatives (five time points). The model provides useful insights into the networks, such as the overall gender and ethnic separation in the Dutch classroom example and the reflection of the political ideology of the latent positions in the bill cosponsorship example. In their followup work, Sewell and Chen [70] adjust the model for weighted edges including rankordered count data and nonnegative continuous edges, via link functions and data augmentation. The applicability and interpretability of this extended model are demonstrated with mobile phone call log data and world exports/imports data, which are the examples of the count and continuous edges, respectively.
3.2.3 Dynamic Latent Distance Model for Bipartite Network
A bipartite graph is a graph in which vertices are divided into two groups and links only exist across groups. Friel et al. [19] develop a statistical model for bipartite temporal networks by extending the DSNL model, with different assumptions from the bipartite network model in Sakar et al. [68]. While Sakar et al. [68] model the cooccurrence of counts of authors and words from their empirical distribution, Friel et al. [19] directly model the evolution of edges through three Markov processes: one on the parameters, one on the latent positions, and one on the edges.
Define the binary edge if node in the first group () is connected to node in the second group () at time (e.g., director sitting on board in year ), and otherwise. The chance of forming an edge at time depends on the previous state , and the latent distance between the two nodes in two different groups. The model assumes that
where is an edge persistence parameter and is a nonedge persistence parameter, which are separately defined to capture the difference in the persistence of edges and nonedges, and ^{1}^{1}1While the general framework assumes timevarying latent positions , Friel et al. [19] fixed the first group’s latent positions (i.e., ) for their application. and are the dimensional latent positions of node in the first group and in the second group, respectively, at time .
The latent positions in the first group are assumed to have independent Gaussian prior, while the latent positions in the second group are modeled using random walk processes (given the initial positions at ):
where are the precision parameters. Same as the latent positions of the second group, this model further assumes that the parameters and also follow the Markov processes:
where are the precision parameters. Same as the inference procedures in other dynamic latent distance models, MCMC sampling is used to sample from the joint posterior distribution, and Procrustes transformation is applied for fixed orientation.
Using the model, the authors analyze the dynamic evolution of the leading Irish companies and their directors from 2003 to 2013. The network data contain directors (first group), companies (second group), and 3,855 edges. Mainly focused on understanding the persistence of links and the heterogeneity in the latent positions, the analysis reveals an increasing level of interlocking board behavior before and during the financial crisis, and stabilization thereafter.
3.3 Dynamic Latent Factor Models
As mentioned before, the multiplicative form in the projection model [35] has been continuously studied under the name of “latent factor models”. As shown in Hoff [29], latent factor models have some advantages over the latent distance model when both homophily—a type of pattern in which similar nodes may be more likely to attach to each other than dissimilar ones—and stochastic equivalence—a type of pattern in which the nodes can be divided into groups such that members of the same group have similar patterns of relationships—are present in the network. In this section, we demonstrate the development of dynamic latent factor models along with the motivating examples that highlight the models’ capability of capturing higherorder dependence such as reciprocity—tendency to form respective edges in a directed network—and stochastic equivalence.
3.3.1 Dynamic Gravity Model
Gravity models [49] in social sciences are used to model the bilateral relationships that can be predicted by the mass and distance of the the pair. In modeling international trade and conflicts, Ward and Hoff [81] and Ward et al. [82] are the earlier works that combine the bilinear mixedeffects model (GBME) in Section 3.1.3 with the gravity models. However, their applications to temporal networks are done by separately fitting the static model to each time point’s network. Later, Ward et al. [80] improve their approach by including the previous year’s fitted positions. The resulting model has the following form:
where consists of three covariates in the gravity models—the log of gross domestic product (GDP) of country and at time , respectively, and the log of geographic distance between country and at time . Different from the GBME, this model separates the latent positions to sending activity and receiving activity , and also includes the lagged position term to estimate the effect of positions at time on the edges at time .
This model is applied to bilateral trade data from 1990 to 2008. After several goodness of fit tests including the checks and outofsample predictions, the authors confirm the presence of strong secondorder (i.e., reciprocity) and thirdorder dependencies in the world trade network. They demonstrate that the GBME model outperforms conventional ones in international trade literature, regarding predicting the observed values and understanding unobserved dyadic relationships.
3.3.2 Hierarchical Multilinear Model
By treating dynamic network data as an example, Hoff [31, 34] develops a general modeling framework for array data via reducedrank decompositions, in which the array can be expressed as products of lowdimensional latent factors. Specifically for symmetric dynamic network data , the model takes the following form:
where
is a matrix consisting of the timeinvariant eigenvectors, assumed to have a matrix normal prior distribution, and
is the timevarying eigenvalue matrix. This can be seen as a generalization of the latent factor models, using the reducedrank approximations to matrices. Further extensions to arrays can be found in Hoff (2015b)
[33], with longitudinal network serving as an example of the general model. This model is implemented as the R package “AMEN” [32].In Hoff [34], the model is applied to analyze the international cooperation and conflict data in Ward et al. [82]. The dataset includes the records of militarized conflict and cooperation of 66 countries in every five years from 1950 to 1985, along with some economic and political characteristics of the countries. The model discovers that a majority of the conflicts over the cold war period involves economically large countries, and also reveals different conflict and cooperation patterns across countries over time.
3.3.3 Dynamic Bilinear Effects Model
Gaussian processes (GP’s) have long been used in modeling temporally or spatially dependent data. Durante and Dunson [13, 15] propose a model that extends the bilinear component in Section 3.1.3 to dynamic case by assuming the mean parameters and the latent factors are evolving in time via Gaussian processes. In particular, the model assumes
where is the linked mean parameter of a generalized linear model, is the mean process, and and are the dimensional latent positions at time . Here, the parameters , , and (for ) vary over time given their GP priors
where and are both matrices constructed using squared exponential covariance function—as a function of the distance between two timepoints and . In other words,
where and are the lengthscale parameters in GP, and is the variance parameter with hierarchical Gamma priors.
The main advantage of this model is that it relaxes the Markovian assumption in most of the previously mentioned work and allows an unequal spacing of the observed time points.
Durante and Dunson [15] apply the method to analyze the comovements of 23 National Stock Market Indices from 2004 to 2013. The results successfully reflect the global financial crisis and Greek debt crisis periods. The timevarying predictors help to discover the existence of international financial contagion effects and the opposite effects of verbal and material cooperation efforts on financial comovements.
4 Stochastic Blockmodels
In this section, we briefly summarize a statistical approach to a posteriori block modeling for networks introduced by [74] and [59] where the membership of nodes are unknown as well as the two relevant models, the mixed membership stochastic blockmodel [2] and the degreecorrected stochastic blockmodel [41], and then introduce several dynamic extensions of these static models.
4.1 Static Stochastic Blockmodels
The main assumption of the stochastic blockmodels is that the nodes of the network are partitioned into several latent classes (or blocks). To be specific, Snijders and Nowicki [74] and Nowicki and Snijders [59] assume the set of nodes is partitioned into categories labeled . The class of node is denoted by and the classes are obtained in the vector . The set of classes is denoted by
. The models further assume that the probability distribution of the edge between two nodes depends only on the classes to which they belong. Thus the edges are conditionally independent given the class vector.
Stochastic blockmodels inherit the philosophy of finite mixture models, and assume that the unobserved classes are i.i.d. random variables with the probability
for class
. Therefore, the joint distribution of
is defined bywhere
denotes the number of vertices with class .
The model for the edges between the nodes depends on the node classes in the following ways. Given the vector of node classes , the random vectors for with are independent, and the probabilities are
where is a vector of edge values and is the class dependent edge probabilities which satisfy the condition
The conditional distribution of given the vector of classes is given in the following form
where , , , are the edge counts for block .
For the inference of the stochastic blockmodels, both Bayesian and frequentist approaches are proposed. In Bayesian approach, the prior distribution for parameters is taken to be a product of Dirichlet distributions for the class distribution and the edges between and within classes of the given memberships. The posterior distribution is estimated using the Gibbs sampler. On the other hand, in frequentist approach, several different algorithms are proposed to estimate the classes: Rohe et al. [65]
propose the spectral clustering to estimate classes in the model, Guédon and Vershynin
[22] and Amini and Levina [4] write the class estimation problem as semidefinite optimization problem and find the solution, and Amini et al. [3] propose pseudolikelihood algorithm which provides consistent estimates of classes.Despite the simplicity in model formulation, stochastic blockmodels provide a powerful tool in modeling networks. They allow one to represent the effect of unobserved heterogeneity of individual positions or preferences on the pattern of pairwise relations. Since the heterogeneity is modeled by the stochastic membership of the
classes, regarding cluster analysis, it is more similar to a mixture model rather than a discrete classification model.
4.1.1 Mixed Membership Stochastic Blockmodel
Many realworld networks are multifaceted. However, stochastic blockmodels suffer from a limitation that each node can only belong to one group, or in other words, play a single latent role. To overcome this issue, Airoldi et al. [2] relax the assumption of a single latent role for nodes and develop the mixed membership stochastic blockmodel.
In this paper [2], the authors focus on directed networks and assume the observed network is generated according to nodespecific distributions of community membership and edgespecific indicator vectors denoting the membership in one of the communities. Each node is associated with a randomly drawn vector for node , where denotes the probability of node belonging to group . That is, each node can simultaneously belong to multiple groups with different degrees of affiliation degree. The probabilities of edges between different groups are defined by the matrix of Bernoulli rates , where represents the probability of having an edge between a node from group and a node form group . The mixed membership stochastic blockmodel posits that the are drawn from the following generative process.

For each node :

Draw a dimensional mixed membership vector .


For each possible edge variable :

Draw membership indicator for the initiator .

Draw membership indicator for the receiver .

Sample the edge .

Here, the indicator vector denotes the group membership of node when node has outgoing edge to node , and denotes the group membership of node when node has outgoing edge to node .
Under the mixed membership stochastic blockmodel assumptions, the joint probability of the data and the latent variables can be written in the following form:
where is the set of mixed membership vectors, and and are the sets of membership indicator vectors.
A nested variational inference algorithm is used for posterior inference on the pernode mixed membership vectors and perpair roles. To compute the empirical Bayes estimates of the model parameters, variational expectationmaximization (EM) algorithm is used. Moreover, in recent years, Mao et al.
[52] and Jin et al. [40] propose consistent algorithm inferring mixed membership of nodes in the mixed membership stochastic blockmodel.4.1.2 DegreeCorrected Stochastic Blockmodel
Although the stochastic blockmodels have been popularly used as a tool for detecting community structure in networks, they fail to capture the heterogeneity of node degrees (i.e., number of edges the node has to other nodes) within communities which often observed in realworld networks. To solve this problem, Karrer and Newman [41] relax the assumption that the stochastic blockmodels treat all nodes within a community as stochastically equivalent, and propose the degreecorrected stochastic blockmodel that can consider node covariates.
This model focuses on undirected networks and allows networks to contain both multiedges and selfedges, even though many realworld networks have no such edges. Let be an undirected multigraph on nodes, possibly including selfedges, and let be an element of the adjacency matrix of the multigraph. Also, there is a new set of parameters controlling the expected degrees of nodes
. The model assumes that the number of edges between each pair of nodes is independently Poisson distributed, and define
to be the expected value of the adjacency matrix element for nodes and lying in groups and , respectively. Then, has the probabilityTo estimate the parameters and infer the membership of nodes, the authors suggest a novel method where the loglikelihood is maximized in two stages. First, they find maximum likelihood values of model parameters and . Then, they propose informationtheoretic quantities for community detection or clustering. Similar to Rohe et al. [65], Qin and Rohe [61] propose spectral clustering under the degreecorrected stochastic blockmodel. Also, Zhao et al. [87] generalize the consistency framework of stochastic blockmodel to the degreecorrected stochastic blockmodel and obtain a general theorem for community detection consistency.
4.2 Dynamic Stochastic Blockmodels
4.2.1 Dynamic Stochastic Blockmodel with Varying Community Memberships
To analyze dynamic communities, Yang et al. [86] propose a model that captures the evolution of communities by explicitly modeling the transition of community memberships for individual nodes in the network.
Let be the snapshot of a network at a given time step , with the number of nodes in the network. Each element in is the weight assigned to the edge between nodes and . Although this dynamic stochastic blockmodel can handle both the frequency of interactions (i.e., a natural number) and the binary number indicating presence or absence of interactions, the authors’ main focus is on the binary edges. For a dynamic network, they use to denote a collection of snapshots for the network over discrete time steps. They also use , where is the total number of communities, to denote the community membership of node . In addition, they introduce to indicate if node is in the th community where outputs 1 if is true and zero otherwise. Community matrix then indicates the community assignments of all nodes in network at a given time step . Finally, they set to represent the collection of community assignments of all nodes over time steps.
Assuming that the community matrix for time step is given, the model uses a transition matrix to model the community matrix at time step . Moreover, is used as initial probability where is the probability for a node to be assigned to community . Given the community memberships in , the edge between nodes is determined stochastically by probabilities .
The joint probability of the data and the latent variable can be written in the following form:
where
and
respectively. Note that in their model selfloops are not considered and so in the above equations means over all ’s and ’s such that .
For the inference of the model, the authors introduce point estimation approach using a Variational EM algorithm. They also propose a method based on a combination of probabilistic simulated annealing algorithm and Gibbs sampling algorithm to infer the parameters.
This model is applied to three datasets. The southern women dataset records the attendance of 18 women to 14 social events over a 9month period in 1930’s in Natchez, Mississippi, as part of their work to study social class in both black and white societies. The blog dataset is collected by the NEC labs and includes 148,681 entrytoentry links among 407 blogs during 15 months. The paper coauthorship (DBLP) dataset contains the coauthorship information of papers in 28 conferences in three areas (data mining (DM), database (DB), and artificial intelligence (AI)) over ten years (19972006). For the first two datasets, the model can detect the change of community membership over the study period. In the coauthorship dataset, the model discovers the trend that, along the time, the community of DB gets smaller, the community of DM gets larger, and the community of AI remains relatively stable. Moreover, they also find some highly productive researchers had many changes in community membership.
4.2.2 Dynamic Stochastic Blockmodel with Varying Community Memberships and Connectivity Parameters
Both Xu and Hero [85] and Matias and Miele [53] propose methods which relax the constraint of fixed connectivity probabilities of Yang et al. [86] and consider both community memberships and connectivity parameters vary over time. The main difference between the two papers is that the former entirely relax the constraint of fixed connectivity parameters while the latter kept weak constraint on the connectivity parameters to handle the label switching issues across different time steps. Here we first introduce Xu and Hero [85] followed by Matias and Miele [53].
Xu and Hero [85] propose a state space model through time on the probability of connection between groups and focus on directed networks with no selfedges. Let denotes the adjacency matrix of the network observed at time step . Let denotes the set of all snapshots up to time , i.e., . The notation indicates that node belong class and the classes of all nodes at time is given by a vector with if at time . denotes the connectivity parameter between groups, where denotes the probability of forming an edge between a node in class and a node in class , and denotes the number of classes. The set can be viewed as the states of a dynamic system that generates the noisy observation sequence. Since is a probability that must be bounded between 0 and 1, they work with the , which is the logit of .
The following linear dynamic system gives a model for the state evolution
where is the state transition model applied to the previous state, is the vector representation of the , and is a random vector of zeromean Gaussian entries, commonly referred to as process noise, with covariance matrix . They assume to be timeinvariant and not necessarily diagonal because states could evolve in a correlated manner.
Since the class memberships are not known and should be estimated along with , the labelswitching methods are used as in [41], [87], and maximized the posterior state density given the entire sequence of observations up to time , in order to account for the prior information. The posterior state density is given by
Their inference is based on online iterative estimation procedure alternating two steps. First, a labelswitching method is used to explore the space of node group configurations. Then, they use extended Kalman filter (EKF) that optimizes the likelihood when the group memberships are known.
This model is applied to two datasets. The MIT reality mining dataset records cell phone activity of 94 students and staff at MIT over a year. They use the participant affiliations as groundtruth class memberships and compare the class estimation accuracy of their model to static stochastic blockmodel. The Enron email dataset consists of about 500,000 email messages of 184 Enron employees from 1988 to 2002. The model discovers a steady increasing trend in edge probabilities from Enron CEOs to presidents as Enron’s financial situation worsened. On the other hand, edge probabilities between other employees remained at their baseline levels until Enron fell under federal investigation.
Matias and Miele [53] focus on detecting communities characterized by a stable withingroup connectivity behavior by adding some constraints on the varying connectivity parameter. They consider weighted interactions between nodes recorded over time in a set of data matrices . For each , the adjacency matrix contains realvalues measuring interactions between the nodes . Their model can also consider undirected networks without selfloops. They assume that the nodes are split into latent communities that vary through time, as encoded by the random variables . They use independent Markov chains to model the evolution of the nodes memberships over time. For each node , the process is an irreducible, aperiodic stationary Markov chains with transition matrix and initial stationary distribution . They consider as a random vector constrained to .
To take account of possible sparse weighted networks, they assume that
where and is a parametric family of distribution with no mass at 0 with its density denoted by . Here is the sparsity parameter which satisfy . Moreover, is the connectivity parameter depends on the choice of the parametric family. They also let to denote the density of the distribution.
The joint probability of the data and the latent variable can then be written in the following form:
To infer the model parameters and to cluster the nodes, variational expectationmaximization (VEM) algorithm is employed. Furthermore, they propose integrated classification likelihood (ICL) criterion for estimating the number of communities.
This model is applied to reveal social structure and organizations in a French high school and two animal interaction datasets. The French high school dataset consists of facetoface encounters of 31 students in the class during four days in December 2011. The model finds four distinct groups showing different interaction patterns. The model also discovers the evidence of some gender homophily. The migratory birds (sparrows) dataset is composed of 3 time steps with 69 birds in total. The model successfully confirms the analysis in Shizuka et al. [71] that the sparrow community is stable. The model is also applied to Indian equids (onagers) dataset by aggregating interactions of 23 onagers monthly from February 2003 to May 2003 and reveals hierarchical social integration process.
On the other hand, under the setting of constant community membership across different time steps and varying connectivity parameters, Bhattacharyya and Chatterjee [8] propose spectral clustering algorithm for dynamic stochastic blockmodel which guarantees consistency of community detection.
4.2.3 Dynamic Mixed Membership Stochastic Blockmodel
Xing et al. [84] and Ho et al. [26] propose dynamic extensions of the static mixed membership stochastic blockmodel using a state space model for the timevarying parameters of the priors, both for the mixed membership vector of a node and the connectivity behavior. Their methods model the role of each node as a dynamic mixed membership vector that allows nodes to behave differently over time as well as carry out different roles when interacting with different nodes.
Xing et al. [84] consider a temporal series of networks where is the set of edges at time between a fixed set of nodes. In the static mixed membership stochastic blockmodels, Airoldi et al. [2] employ a simple Dirichlet prior because it is conjugate to the multinomial distribution over every latent membership label defined by relevant . However, to model temporal dynamics of the node roles and capture the nontrivial correlations between different roles at the same time, the authors employ logistic normal distribution. Logistic normal distribution, is used for the prior for the mixed membership vectors, and another logistic normal distribution, is used for the prior for the entries of the connectivity matrix, .
Their basic model structure is based on the state space model, which defines a linear dynamic transformation of the mixed membership priors over adjacent time points:
where represents the mean parameter of the prior distribution of the transformed mixed membership vectors of all vertices at time , and represents normal transition noise for the mixed membership prior, and the transition matrix shapes the trajectory of temporal transformation of the prior. Also, the model assigns similar assumption for such that
where is scalar and .
The dynamic mixed membership stochastic blockmodel thus consists of three components: a state space model for mixed membership vector, a state space model for entries of connectivity matrix, and a logistic normal mixed membership stochastic blockmodel for the networks. The first two components explain the temporal dynamics while the third component models the generative process of the network at each time point. Following is an outline of the generative process.

State space model for mixed membership prior:

, sample the means of the mixed membership prior at .

, sample the means of the mixed membership prior at timepoints .


State space model for connectivity matrix:

, sample the means of the entries of connectivity matrix prior between and at .

, sample the means of the entries of connectivity matrix prior between and at .

, sample the entries of connectivity matrix between and at .


Logistic normal mixed membership stochastic blockmodel:

, sample a dimensional mixed membership vector for each node , at .

, sample membership indicator for the initiator for each node , at .

, sample membership indicator for the receiver for each node , at .

, sample the edges between nodes and for , at .

The joint probability of the data and the latent variables can be written in similar form as the static mixed membership stochastic blockmodel.
For the posterior inference, Laplace variational approximation scheme based on the generalized means field (GMF) approximation is used to infer the latent variables and estimate the model parameters.
This model is applied to three datasets. The Sampson’s monk dataset contains the liking relationship among 18 monks over three time points. The result of the model is consistent with previous works except for one controversial person, Mark. The model is also applied to a subset of Enron email dataset of 151 persons from 2001 and discovers five distinct roles of actors. They visualize and track the trajectory of the mixed membership vector for an individual to understand how each evolves in his or her role. The third example is studying the evolving gene network of fruit flies. The Drosophila melanogaster gene network dataset contains 22 networks at different time points across various developmental stages. The model discovers that many genes exhibit a sharp transition regarding their roles near the end of the embryonic stage. They further examine 45 ontological groups and find an overall pattern that each role consists of genes with a variety of functions, and the functional composition of each role varies across time.
Different from Xing et al. [84] which employs a timeevolving logistic normal distribution on all networks nodes, Ho et al. [26] generalize the prior on nodes to be a mixture of timeevolving logistic normal distributions. This mixture prior is multimodal and captures correlations between roles, allowing it to fit complex data densities that the unimodal prior cannot.
The state space model, which defines a linear dynamic transformation of the mixed membership priors, now contains distinct trajectories over adjacent time points,
where is a transition matrix and represents the normal transition noise. Now each mixed membership vector is drawn from one of the trajectories . The choice of trajectory for is given by the indicator