I A dynamic stochastic block model
The stochastic block model (SBM) is a classic model of community structure in static networks. Here, we use a variant of the SBM in which the community labels of nodes change over time, but where edges are independent, which is a special case of several models previously introduced for community detection in dynamic networks Kim and Leskovec (2013); Xu and Hero (2014); Han et al. (2014); Yang et al. (2009); Xing et al. (2010). Crucially, our variant captures the important behavior of changing community labels and is analytically tractable.
Under the SBM a graph is generated as follows. Using a prior distribution over group or community labels, we assign each of the nodes to a group . We then generate the edges
according to the probability specified by acommunity interaction matrix and the group assignments . In the sparse case, where
, the resulting network is locally tree like and the number of edges between groups is Poisson distributed with parameter.
In a dynamic network, we have a sequence of graphs with
, where each graph has its own group assignment vector. To generate each such assignment, we draw from the prior, where each node has probability of being in community . With probability , each node keeps its label from one time step to the next , and otherwise it chooses a new label from the prior . Formally, the transition probability for community memberships is
where if and 0 otherwise. The edges are then generated independently for each according to the community interaction matrix , by connecting each pair of nodes at the same time and with probability . Note that while the group assignments may change over time, the matrix remains constant. Subsequently, we use to denote the adjacency matrix for the graph at time , and to denote the diagonal matrix of node degrees at time , i.e. .
At successive times in this model, edges are correlated only through the group assignments . Given these, the full likelihood of a graph sequence under this dynamic SBM is
For our subsequent analysis, we focus on the common choices of a uniform prior , and where has two distinct entries: if and if . In this setting the average degree of each graph is then . We are interested in the sparse regime where , because most real-world networks of interest are sparse (e.g., the Facebook social network), and sparsity allows us to carry out asymptotically optimal inference. Note that the case where every group has distinct average degrees is easier than the equal-average-degree case that we consider, because distinct average degrees give prior information about group memberships.
Ii The detectability threshold in dynamic networks
The fundamental question we now consider is, under what conditions can we detect, better than chance, the correct time-evolving labeling of the latent communities in this model?
Previous work on community detection in static networks has shown that there exists a sharp threshold below which no algorithm can perform better than chance in recovering the latent community structure Decelle et al. (2011); Mossel et al. (2012), at least in the case . This threshold occurs at positive values of the difference in the internal and external group connection probabilities, meaning that the community structure may still exist, but is undetectable. In terms of the SBM’s parameters, this phase transition occurs at
In a dynamic network where community memberships correlate across time, we will exploit these correlations to improve upon the static detectability threshold. In the worst case, when these temporal correlations are absent, i.e., , we should do no worse than the static threshold. To facilitate our analysis, we define an extended graph structure, called a spatiotemporal graph, in which we take and add special “temporal” edges that connect each node with its time-adjacent versions and . Under our model, the “spatial” edges are independent and sparse, implying that this spatiotemporal graph is locally treelike.
Consider a particular node as and . Moving outward in space and time, inference becomes a tree reconstruction problem, with stochastic transition matrix , along each spatial edge
is the identity matrix,is the matrix of all s, and
Similarly, along each temporal edge we have a stochastic matrix
Thus, moving along a spatial or temporal edge copies a community label with probability or respectively, and otherwise randomizes it according to the prior. That is, these edges multiply the distribution of labels by the stochastic matrices and
, whose eigenvalues areand , other than the trivial eigenvalue
corresponding to the uniform distribution.
Since each node in the spatiotemporal graph has
(Poisson-distributed random variable with mean) spatial edges but exactly two temporal edges, the tree is generated by a two-type branching process. Each spatial edge gives rise to two temporal edges (to each of the time-adjacent versions of its end point), and each temporal edge gives rise to one temporal edge (continuing in the same direction in time), and both give rise to spatial edges. Thus the matrix describing the expected number of children (where we multiply a column vector of populations on the left) is . Using the results of Ref. Janson and Mossel (2004), the detectability threshold occurs when the largest eigenvalue of matrix exceeds unity, which yields
When , i.e., when there is no temporal correlation in community assignments over time, Eq. (7) recovers the static detectability threshold , which is equivalent to Eq. (3). On the other hand, when , i.e., when the community assignments are fixed across time, we may simply integrate the graph over , making it arbitrarily dense. We then have detectability for any , implying that any amount of community structure can be detected. At intermediate values of , the detectability threshold falls between these two extremes.
This analysis corresponds to robust reconstruction on trees, where we are given noisy information at the leaves of a tree and we want to propagate this information to the root Janson and Mossel (2004). For groups, it is known rigorously in the static case Mossel et al. (2012) that detecting the communities below this bound is information-theoretically impossible. We conjecture that the same is true in the dynamic case. For groups, it has been conjectured Decelle et al. (2011) that it is information-theoretically possible to succeed beyond the Kesten-Stigum bound, but that doing so takes exponential time.
Iii Bayesian inference of the model
Given an observed graph sequence , we use Bayesian inference to learn the posterior distribution of latent community assignments:
This distribution is hard to compute in general because the summation runs over an exponential number of terms. However, when the spatiotemporal graph is sparse, as generated by our model, we may make a controlled Bethe approximation (also known as belief propagation (BP) in machine learning and as the “cavity method” in statistical physics) that allows us to carry out Bayesian inference in an efficient and asymptotically optimal way. We now describe a BP algorithm for learning our model form data, which we then linearize to obtain a fast spectral approach, based on a dynamic version of the non-backtracking matrix. This yields two inference algorithms that perform accurately all the way down to the transition.
iii.1 Belief propagation
Instead of inferring the joint posterior distribution, we use belief propagation to compute posterior marginal probabilities of node labels over time. Belief propagation assumes conditional independence of these marginals, which is exact when the graph is a tree and is a good approximation when the graph is locally-tree like, as in our spatiotemporal graph. In our setting, nodes update their current belief about marginals according to the marginals of both their spatial and temporal neighbors. That is, we define two types of messages: spatial messages that pass along spatial edges and temporal messages that pass along temporal edges. Fig. 1 illustrates this message passing scheme for a spatiotemporal graph.
A spatial message gives the marginal probability of a node at time being in community , when we consider node to be absent at time . This message is computed as
where is the normalization. The temporal message (or ) represents the marginal probability of node at time being in community , when we consider node to be absent at time (or at ) and has a similar form:
When or , we remove the term corresponding to the temporal edge coming from outside the domain of . Furthermore, following past work on BP for the static SBM Decelle et al. (2011); Aicher et al. (2015), we exploit these networks’ sparsity to reduce the computational complexity of the spatial updates at the cost of introducing corrections in sparse graphs. Specifically, we let be the same for all of ’s non-neighbors
. We then model the effects of all such non-edges as an adaptive external field on each node, which depends on the current estimated marginals. That is, we let , where , which has the effect of preventing belief propagation from putting all the nodes at a given time into the same community. The adaptive fields only need to be updated after each BP iteration. This approximation yields a significant improvement in efficiency, reducing the computational complexity to be proportional to total number of edges in the spatiotemporal graph , rather than .
Once the BP messages converge, we compute the marginal probability that node belongs at time . This is identical to (9) and (10), except that we take all incoming edges into account. We then obtain a partition by marginalization, which assigns each node to its most-likely group:
It is well known in Bayesian inference Iba (1999) that if the marginals are exact, then the marginalized partition is the optimal estimator of the latent community labels. Because spatiotemporal graphs under our model are sparse, we know that with , the marginals given by BP are asymptotically correct. Thus, our BP algorithm succeeds all the way down to the detectability threshold given by Eq. (7), and gives an asymptotically optimal partition in terms of accuracy.
iii.2 Spectral clustering
The BP equations described above can be linearized to obtain a fast spectral approach for detecting community structure in dynamic networks. It is easy to verify that in our setting, when , the average degree in each group is . This implies that BP equations will always have a solution
which we call a factorized fixed point. This fixed point only reflects the permutation symmetry in the system, and could be unstable due to random perturbations. If we use the correct parameters in BP equations, i.e., the same parameter used to generate the observed network, then in the language of physics we would say that system is in the Nishimori line Iba (1999). That is, if the BP messages deviate from the factorized solution, then they are correlated with the latent community labels and we say that there is no spin glass phase in system Iba (1999). This allows us to simplify the BP equations by studying how the messages deviate from the factorized solution, which results in a linearized version of BP. In the static SBM, this linearization is equivalent to a spectral clustering algorithm using the non-backtracking matrix Krzakala et al. (2013).
To do this, we rewrite the BP messages as the uniform fixed point plus deviations away from it. The vector of deviations is given by
and the linearized BP equations are then
where means neighbors of , and denote derivatives evaluated at the factorized fixed point:
Solving Eq. (III.2
) amounts to finding eigenvectors of the Jacobian matrixcomposed of derivatives of the BP messages. However, the size of the matrix is , which is relatively large for an eigenvector problem. Using the non-backtracking matrix approach Krzakala et al. (2013), we convert this problem into a smaller eigenvector problem of size by defining
where denotes the -dimensional identity matrix; is the adjacency matrix of temporal edges with ; is the diagonal matrix of temporal degrees with if , and if or ; is the -dimensional matrix consisting of all the spatial edges, i.e., meaning ; is the diagonal matrix of spatial degrees where .
We now obtain a spectral clustering algorithm using in the following way: given a spatiotemporal graph, we construct matrix , then take vectors composed of first entries of eigenvectors associated with the largest (absolute) eigenvalues, and finally perform -means clustering on matrix composed of the vectors. This yields a partition of the nodes; if desired number of clusters is two, then we simply use the sign of entries of the vector to separate nodes into two communities.
From the principle of linearization, we know that real eigenvalues of the non-backtracking matrix describe stability properties of fixed points of the BP equations, i.e., if there is a real-valued eigenvalue larger than unity, it represents a stable fixed point in the equations. Moreover, if the BP equations have a stable fixed point, then should have a real-eigenvalue that is larger than unity, denoting a partition of the nodes that correlates with the latent community labels. Thus, our spectral clustering algorithm should work as long as BP works, implying that it also works all the way down to the detectability transition in sparse networks.
In Fig. 2 (left) we show the spectrum of in the complex plane for a network in the detectable regime, generated by the model. As with existing non-backtracking approaches Krzakala et al. (2013), most of the eigenvalues are confined to a disk, while several real eigenvalues fall outside this disk. In this example, entries of the eigenvector associated with the largest real eigenvalue have the same sign, hence the leading or “ferromagnetic” eigenvector does not yield information about the latent community structure. In practice, we can perform regularizations to push such ferromagnetic eigenvectors back into bulk, thereby lifting the eigenvectors correlated with the latent community structure to the top positions. Eigenvectors associated with other real eigenvalues outside the bulk are correlated with the latent community structure. In this case, because we have two groups, we obtain the inferred partition by using the sign of entries of second real eigenvector .
Iv Numerical verification
To verify our claims of the detectability transition in dynamic networks, and the accuracy of our algorithms, we conduct the following numerical experiment. Using our generative model of dynamic networks with community structure, we generate a number of dynamic networks for various choices of . When , communities are maximally strong, with every edge being located within a community, while at , we have Erdős-Rényi random graphs with no community structure. We then use our BP or spectral algorithm to infer the group assignments, assuming within each sequence that parameters are known. For each choice of , we average our results over 100 dynamic networks with graphs and nodes (for nodes total), with an average degree , divided into latent communities.
We measure the accuracy of the inferred community labels by the overlap between the latent partition and the inferred one . This is the fraction of nodes labeled correctly, maximized over all permutations of the groups, normalized so that it is if and if is uniformly random. In Fig. 2 (right) we show the overlap obtained by BP for dynamic networks as a function of for several choices of . The detectability threshold for each , from (7) is shown as vertical lines in the lower panel. When , we recover the static detectability threshold given by Eq. (3). As we increase , the phase transition occurs at increasing values of , as predicted, with the largest increase occurring when .
Similar results are obtained for other choices of and , with better agreement for larger networks. The slight deviation between numerical and analytic transition points observable in Fig. 2 right is a finite-size effect, which we numerically estimated to decrease like .
Figure 3 show the overlap throughout the -plane, using both BP and spectral algorithms, along with the line of the threshold given by Eq. (7). Notably, both algorithms perform similarly: they have large overlap with small , indicating that the learned partition is highly correlated with the latent community structure. As increases (weaker community structure), both algorithms encounter a second-order phase transition in which the overlap decreases from a finite value to zero. Separate numerical experiments indicate that the convergence time of BP diverges in the vicinity of the phase transition, which agrees with past work on the detectability threshold in static networks Decelle et al. (2011). We also find that at each point in ()-plane, the accuracy of BP is always larger than that of the spectral algorithm, especially away from the transition, reflecting the optimality of our BP algorithm.
We have derived a mathematically precise and general limit to the detectability of communities in dynamic networks. This threshold assumes a probabilistic model of community structure that is a special case of several previously developed methods to detect dynamic communities: specifically, where nodes may change their community membership over time, but where edges are generated independently at each time step. We also gave two efficient algorithms for learning latent community structure that are optimal in the sense that they succeed all the way down to the detectability threshold in dynamic networks.
A simple extension of our algorithm is to apply our BP equations to a dense network consisting of all spatial edges from all graphs projected to the time , handling the message passing over time steps by using a damping factor . This approach extends our analysis to networks that evolve in continuous time rather than in discrete time steps.
For larger numbers of groups, such as , it has been conjectured Decelle et al. (2011) that there is a “hard but detectable” regime where the factorized fixed point described in Section III.2 is locally stable, but where one or more accurate fixed points exist as well. In such a regime, community detection is information-theoretically possible, but we believe that it takes exponential time (though see Kanade et al. (2014) for the case where the number of groups grows with ). We propose this as a direction for further work.
Other directions for future work include handling cases where the community interaction matrix may also change over time (a situation similar to change-point detection in networks Peel and Clauset (2015)), where edges are not generated independently at each time step, or where networks have edge weights Aicher et al. (2015) or node annotations.
Acknowledgements.The authors thank Elchanan Mossel and Andrey Lokhov for helpful conversations. Financial support for this research was provided in part by Grant No. IIS-1452718 (AG, AC) from the National Science Foundation, Grant #FA9550-12-1-0432 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA) (LP), and the John Templeton Foundation (PZ, CM). Author order is joint first-authorship for AG and PZ, with the remaining authors appearing alphabetically.
- Clauset and Eagle (2007) A. Clauset and N. Eagle, in DIMACS Workshop on Computational Methods for Dynamic Interaction Networks (2007) arXiv:1211.7343.
- Berger-Wolf et al. (2010) T. Berger-Wolf, C. Tantipathananandh, and D. Kempe, in Link Mining: Models, Algorithms, and Applications (Springer, 2010) pp. 307–336.
- Gauvin et al. (2014) L. Gauvin, A. Panisson, and C. Cattuto, PLOS ONE 9, e86028 (2014).
- Kim and Leskovec (2013) M. Kim and J. Leskovec, in Advances in Neural Information Processing Systems (2013) pp. 1385–1393.
- Mucha et al. (2010) P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J. Onnela, Science 328, 876 (2010).
- Rossi et al. (2013) R. Rossi, B. Gallagher, J. Neville, and K. Henderson, in Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013).
- Xing et al. (2010) E. P. Xing, W. Fu, and L. Song, Annals of Applied Statistics 4, 535 (2010).
- Zhu et al. (2014) L. Zhu, G. Steeg, and A. Galstyan, arXiv preprint, arXiv:1411.3675 (2014).
Von Luxburg et al. (2012)
U. Von Luxburg, R. Williamson, and I. Guyon, in
ICML Unsupervised and Transfer Learning(2012) pp. 65–80.
- Bassett et al. (2013) D. Bassett, M. Porter, N. Wymbs, S. Grafton, J. Carlson, and P. Mucha, Chaos 23, 013142 (2013).
- Bazzi et al. (2015) M. Bazzi, M. Porter, S. Williams, M. McDonald, D. Fenn, and S. Howison, arXiv preprint, arXiv:1501.00040 (2015).
- Acar et al. (2009) E. Acar, D. Dunlavy, and T. Kolda, in Data Mining Workshops, 2009. ICDMW’09. IEEE International Conference on (IEEE, 2009) pp. 262–269.
- Dunlavy et al. (2011) D. Dunlavy, T. Kolda, and E. Acar, ACM Transactions on Knowledge Discovery from Data (TKDD) 5, 10 (2011).
- Sun et al. (2007) J. Sun, C. Faloutsos, S. Papadimitriou, and P. Yu, in Proceedings of the 13th ACM SIGKDD (ACM, 2007) pp. 687–696.
- Rosvall and Bergstrom (2010) M. Rosvall and C. Bergstrom, PLOS ONE 5, e8694 (2010).
- Yang et al. (2009) T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin, in SDM, Vol. 2009 (SIAM, 2009) pp. 990–1001.
- Xu and Hero (2014) K. Xu and A. Hero, Selected Topics in Signal Processing, IEEE Journal of 8, 552 (2014).
- Han et al. (2014) Q. Han, K. Xu, and E. Airoldi, arXiv preprint, arXiv:1410.8597 (2014).
- Peixoto (2015) T. Peixoto, arXiv preprint, arXiv:1504.02381 (2015).
- Valles-Catala et al. (2014) T. Valles-Catala, F. Massucci, R. Guimera, and M. Sales-Pardo, arXiv preprint, arXiv:1411.1098 (2014).
- Aggarwal and Subbian (2014) C. Aggarwal and K. Subbian, ACM Computing Surveys (CSUR) 47, 10 (2014).
- Hartmann et al. (2014) T. Hartmann, A. Kappes, and D. Wagner, arXiv preprint, arXiv:1401.3516 (2014).
- Holland et al. (1983) P. Holland, K. Laskey, and S. Leinhardt, Social Networks 5, 109 (1983).
- Nowicki and Snijders (2001) K. Nowicki and T. A. B. Snijders, Journal of the American Statistical Association 96 (2001).
- Decelle et al. (2011) A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Physical Review E 84, 066106 (2011).
Mossel et al. (2012)
E. Mossel, J. Neeman, and A. Sly, Probability Theory and Related Fields , 1 (2012).
L. Massoulié, in
Proc. of the 46th Annual ACM Symposium on Theory of Computing (STOC)(ACM, 2014) pp. 694–703.
- Mossel et al. (2014) E. Mossel, J. Neeman, and A. Sly, in Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014 (2014) pp. 356–370.
- Krzakala et al. (2013) F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, and P. Zhang, Proc. Natl. Acad. Sci. USA 110, 20935 (2013).
- Bordenave et al. (2015) C. Bordenave, M. Lelarge, and L. Massoulié, arXiv preprint arXiv:1501.06087 (2015).
- Janson and Mossel (2004) S. Janson and E. Mossel, Annals of Probability , 2630 (2004).
- Aicher et al. (2015) C. Aicher, A. Z. Jacobs, and A. Clauset, Journal of Complex Networks 3, 221 (2015).
- Iba (1999) Y. Iba, Journal of Physics A: Mathematical and General 32, 3875 (1999).
Kanade et al. (2014)
V. Kanade, E. Mossel, and T. Schramm, in
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2014, September 4-6, 2014, Barcelona, Spain(2014) pp. 779–792.
Peel and Clauset (2015)
L. Peel and A. Clauset, in
29th AAAI Conference on Artificial Intelligence(2015).