I Methods
Many networks can be partitioned into communities, such that they consist of cohesive (and often dense) groups of vertices with sparse connections between distinct groups [1]. Perhaps the most popular way of detecting communities algorithmically is by optimizing the quality function known as modularity [2]:
(1) |
which measures how well a network can be partitioned into disjoint groups of nodes. In (1), are the elements of the graph’s adjacency matrix , the sum of all of the edge weights in the network is , is the strength (i.e., weighted degree) of node , and the resolution parameter [3] enables us to uncover community structure at different scales. The modularity of a network partition measures the fraction of total edge weight within communities minus that expected if edges were placed randomly according to the null model , which preserves a network’s expected strength distribution. Finding a network partition that attempts to maximize
allows one to probe a network’s community structure. In contrast to traditional forms of spectral clustering, modularity optimization requires no knowledge of the number or sizes of communities, and it also allows one to segment a network into communities of disparate sizes (even for a fixed value of
) [1, 2].Optimization of modularity was recently generalized to “multislice” networks [4]
, which are represented using adjacency tensors and consist of layers of ordinary networks. The framework of multislice networks can thereby be used to represent time-dependent or multiplex networks. In Fig.
1(a), we show a schematic of a multislice network. Using this framework, we define a generalized modularity function [4](2) |
where indicates that community assignment of node from slice , the intraslice edge strength of node in slice is , the corresponding interslice edge strength is , and . In (2), one can use a different resolution parameter in each slice. For a given slice , the quantity gives the edge weight between nodes and . For a given node , the quantity gives the interslice coupling between the th and th slices.
Optimization of the ordinary modularity function (1) has been used to study community structure in myriad networks [1], and it has also been used in the analysis of hyperspectral images [5] recently. In our work, we optimize multislice modularity (2) to examine community structure in social networks and segmentation of images. In each case, we start with a static graph, and each layer of the multislice network uses the same adjacency matrix but associates it with a different resolution-parameter value . We include interslice edges between each node in adjacent slices only, so unless . We set all nonzero interslice edges to a constant value . This setup, which was illustrated using the infamous Zachary Karate Club network in [4], allows one to detect communities using a range of resolution parameter values while enforcing some consistency in clustering identical nodes similarly across slices. The strength of this enforcement becomes larger as one increases . To optimize multislice modularity (2), we use a Louvain-like locally-greedy algorithm [6, 7].
Ii Data and Results
Ii-a LAPD Field Interview Data
In [11], we used data with both geographic and social information about stops involving street gang members in the Los Angeles Police Department (LAPD) Division of Hollenbeck [8]. We optimized multislice modularity (2
) as a means of unsupervised clustering of individual gang members without prior knowledge of the number of gangs or affiliation of the members. We subsequently examined network diagnostics over slices to attempt to estimate the number of gangs that is stable across multiple resolution-parameter values and that also corresponds roughly to the number expected by the LAPD.
Ii-B Cow Image
We segment the cow image in Fig. 1(b) (which contains about pixels) without specifying the number of image components. We build a graph of this image in which each node corresponds to a pixel and each edge indicates the similarity between a pair of pixels. We associate a pixel-neighbor patch with each pixel in the image. Let denote the norm of the difference of patches corresponding to nodes and . The adjacency matrix that we use in each layer of the multislice network has elements
where is the 30th smallest between pixel and other pixels [9].
We construct a multislice network that consists of six copies of . We associate the resolution parameter value with slice . We then optimize multislice modularity and obtain the image segmentations shown in Fig. 2. (Color indicates group assignments.) With this procedure, we are able to identify all four components of the image. As indicated in panel (a), we obtain smaller-scale communities (i.e., groups of pixels) as we increase the value of the resolution parameter. Importantly (see the discussion in Section I), the coupling between slices enforces some consistency in clustering identical nodes similarly across slices. In panels (b) and (c), we observe a good segmentation of the two cows, the sky, and the background grass. As indicated in panel (d), the three groups corresponding to the two cows and the sky stay relatively stable, but the group corresponding to the grass breaks down by the sixth slice.
This application on image segmentation is computationally expensive due to the large number of pixels. It takes a lot of computational memory and time to run the optimization using more slices, which we would like to do in order to investigate how the segmentation evolves over a larger range of resolution values. Computational improvements will be necessary to conduct more detailed analysis.
Iii Future Directions
As mentioned above, optimization of multislice modularity can be computationally expensive. As the size of network data has increased tremendously, it is crucial to develop efficient algorithms to cluster network nodes to obtain insights on applications like social networks and images. To do this, one needs to take advantage of data sparsity to help speed up optimization processes. Aside from the computational cost, how to characterize and analyze the performance of modularity optimization is of importance as well.
Acknowledgments
We are grateful to the LAPD Division of Hollenbeck, and Megan Halvorson, Shannon Reid, Matt Valasik, James Wo, and George E. Tita, at the Department of Criminology, Law, and Society of UCI, for the collection, digitization, and cleaning, of the LAPD Field Interview data. We also thank P. Jeffrey Brantingham for educating us about the anthropology of gangs. This work is supported by ONR grant N000141210838, ONR grant N000141210040, AFOSR MURI grant FA9550-10-1-0569, NSF grant DMS-0968309 and ONR grant N000141010221. MAP acknowledges a research award (#220020177) from the James S. McDonnell Foundation, and he thanks Andrea L. Bertozzi for hosting his visit to UCLA.
References
- [1] M. A. Porter, J.-P. Onnela, and P. J. Mucha, “Communities in networks,” Notices of the American Mathematical Society, vol. 56, no. 9, pp. 1082–1097, 1164–1166, 2009.
- [2] M. E. J. Newman, “Modularity and community structure in networks,” Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577–8582, 2006.
- [3] J. Reichardt and S. Bornholdt, “Statistical mechanics of community detection,” Physical Review E, vol. 74, p. 016110, 2006.
- [4] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela, “Community structure in time-dependent, multiscale, and multiplex networks,” Science, vol. 328, no. 5980, pp. 876–878, 2010, with supplementary material available online. [Online]. Available: http://dx.doi.org/10.1126/science.1184819
- [5] R. A. Mercovich, A. Harkin, and D. Messinger, “Automatic clustering of multispectral imagery by maximization of the graph modularity,” in Proceedings of SPIE, vol. 8048, pp. 80480Z–80480Z-12, 2011.
- [6] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, p. P10008, 2008.
- [7] I. S. Jutla, L. G. S. Jeub, and P. J. Mucha, “A generalized Louvain method for community detection implemented in MATLAB”, 2011–2012. [Online]. Available: http://netwiki.amath.unc.edu/GenLouvain
- [8] Y. van Gennip, B. Hunter, R. Ahn, P. Elliott, K. Luh, M. Halvorson, S. Reid, M. Valasik, J. Wo, G. Tita, A. L. Bertozzi, and P. Brantingham, “Community detection using spectral clustering on sparse geosocial data,” submitted. [Online]. Available: http://arxiv.org/abs/1206.4969
- [9] L. Zelnik-Manor and P. Perona, “Self-tuning spectral clustering,” Advances in Neural Information Processing Systems, vol. 17, pp. 1601–1608, 2004.
- [10] Microsoft Research Cambridge Object Recognition Image Database, Version 1.0, 2005. [Online]. Available: http://research.microsoft.com/downloads
- [11] Y. van Gennip, H. Hu, B. Hunter, and M. A. Porter, “Geosocial Graph-Based Community Detection,” Proceedings of IEEE International Conference on Data Mining Workshop 2012, to appear.
Comments
There are no comments yet.