1. Introduction and the problem statement
1.1. Overview
Community detection is a fundamental problem in social network analysis consisting, roughly speaking, in (unsupervised) dividing social actors into densely knitted and highly related groups with each group well separated from the others. Classical approaches for community detection mainly deal only with the structure of social networks and ignore features of the social network actors. There exist a variety of different methods for this task (see [Fortunato2010]) which have shown their efficiency in multiple experiments (see [Leskovec2010, Lancichinetti2009]). However, realworld networks clearly provide more information about social actors than just connections between them. Usually, the actors fulfil their profiles with age, gender, interests, etc., and other information that is traditionally called node attributes. According to [Wasserman1994], attributes form the second dimension, besides the structural one, in social network representation. There are other classical approaches such as means which may use only node attributes to detect communities but completely ignore links between social actors thus not exploiting all available information. A reasonable generalisation of the methods of both types are the ones that take into account both network structure and actors attributes. This is a relatively novel direction [Bothorel2015] in social network analysis which is quite promising as simultaneous usage of structure and attributes may clarify and enrich the knowledge about the social actors, to give meaning to the detected communities and describe the powers that form them.
During the last decade many methods based on different ideas and techniques have appeared in this direction. Although there exist some partial overviews of them, especially in Related Works sections of published papers and in the survey [Bothorel2015] published in 2015, a recent summary of the subject is a necessity as the growing number of the methods may cause uncertainty in practice.
In this paper we aim to clarify the overall situation by proposing a clear classification of the methods and providing a comprehensive survey of the available results in the area. We not only group and analyse the corresponding methods but also focus on practical aspects, including the information which methods outperform others and which datasets and quality measures are used for evaluation.
To make the exposition more formal, we first provide the reader with necessary notation and state the problem of community detection in social networks whose actors are equipped with attributes.
1.2. Notation and the problem statement
Traditionally social networks such as online social networks or citation networks are modelled as graphs whose vertices (nodes) represent social actors (users or authors) and edges the relations between the actors (friendships, subscriptions or coauthorships). Actors’ attributes (also known as node features or semantic vectors) may be thought as multidimensional nodeattached vectors whose elements contain certain features describing the actors. In what follows, we call
nodeattributed or simply attributed the networks whose actors’ features are available. Clearly, edges as relations between actors may be of different type in real networks. Such networks can be represented via multilayer graphs (each layer containing certain relation type) but in this paper, for the sake of simplicity, we confine ourselves only to onelayer networks (graphs), i.e. to those with edges of one type. We however mention some papers considering community detection in attributed multilayer networks in remarks below.Let us move to the required definitions. We represent a nodeattributed social network as the triple (called a nodeattributed graph) , where is the set of nodes (vertices) representing social actors, the set of edges representing the existing relations between the actors ( is an edge between nodes and ) such that , and the set of attribute vectors associated with nodes in and describing their features. The size of is denoted by or . The size of is denoted by or . The dimension of attribute vectors is . The domain of , i.e. the set of possible values of this attribute, is denoted by . In these terms, th attribute of node is referred to as . The notation introduced is summarised in Figure 1.
By community detection in a nodeattributed network (graph) (also known as attributed graph clustering) we mean unsupervised dividing the attributed graph into disjoint or overlapping subgraphs (equivalently, clusters or communities) , with , such that all vertices are included in the resulting division, i.e.
where a certain balance between the following two properties is achieved:

structural closeness, i.e. nodes within a community are structurally close to each other, while nodes in different communities are not;

attribute homogeneity, i.e. nodes within a community have similar attributes, while nodes in different communities do not.
The basis for these properties is discussed in the forthcoming subsection.
Measures for structural closeness and attribute homogeneity may vary from method to method. One can evaluate the quality of community detection via different structure and attributeaware measures if no ground truth is available or compare the results with the ground truth, otherwise. We will mention the corresponding measures below.
Structureattributes fusion techniques and community detection methods for nodeattributed social networks are the subject of this paper. Related datasets and quality measures are also of our interest.
In what follows, the structural (topological) information contained in is referred to as network structure or topology. The attribute (semantic) information in is referred to as network attributes or semantics.
1.3. Basis for structural closeness and attribute homogeneity. Fusing topology and semantics
Structural closeness is related to the classical concept of (structural) community in terms of structural connections density. According to [Girvan2002], communities are thought as subsets of nodes with dense connections within the subsets and sparse in between. [NewmanGirvan2004] adopts the intuition that nodes within the same community should be better connected than they would be by chance to create the famous Modularity measure that became an influential tool for topologybased community detection in social networks [Bothorel2015]. Multiple Modularity modifications and other structural measures have been proposed to overcome several Modularity limitations [Chakraborty:2017:MCA:3135069.3091106], but Modularity is still a de facto standard in community detection. [NewmanGirvan2004] observes that in social networks Modularity generally belongs to but there is no particular value for good or bad community structure. In fact, any positive Modularity may indicate the presence of a structural community [Clauset2004], oppositely to zero Modularity related to a random graph. High Modularity implies the structural closeness of the nodes within communities.
Attribute homogeneity requirement is based on the social science founding (see e.g. [Marsden1993, McPherson2001, FioreDonath2005, KossinetsWatts2009]) that node attributes in social networks can reflect and affect community structure. The wellknown principle of homophily in social networks states that likeminded social actors have a higher likelihood to be connected [McPherson2001]. Thus community detection process taking into account the attribute homogeneity may provide results of better quality [Bothorel2015].
According to many experiments, e.g. [Moser2009, Ye2017, Sheikh2019, Cohn2001, Getoor2003] and many other papers cited in this survey, topology and semantics often provide complementary information and thus combining them usually leads to achieving better performance in community detection. For example, the semantics may compensate the sparseness of a real network [Jia2017]. At the same time, topological information may be helpful if there are missing or noisy attributes [Sheikh2019]. As observed by [Ding2011], topologyonly or semanticonly community detection is often not as effective as when both sources of information are used. From the other side, some experiments (see e.g. [Akbas2017, Zhou2009]) suggest that this is not always true and topology and semantics may be orthogonal and contradictory in some cases. Moreover, the relations between network topology and semantics may be highly nonlinear [Wang2016]. Consequently, the way how one should use network topology and semantics together is a challenging problem.
1.4. Applications of community detection in attributed social networks
Community detection in nodeattributed networks has not only obvious applications in marketing (recommender systems, targeted advertisements and user profiling) [Alamsyah2014], but also can effectively support other multiple advanced applications. First of all, it may be used for search engine optimization and spam detection [Ruan2013, Muslim2016]. Furthermore, community detection methods may help in counterterrorist activities and disclosing fraudulent schemes [Muslim2016]. There also exist applications related to the analysis of networks of different nature: proteinprotein interactions, genes, epidemics and other biological networks [Muslim2016].
Another area where the ideas of community detection in attributed networks are generally applied is document network clustering. Note that this direction is historically preceding to the community detection and is rich methodologically. For example, in [Neville2003], one of the first papers on community detection in attributed social networks, the following document clustering methods are mentioned: HyPursuit [Weiss1996, Modha2000, He2001], PLSAPHITS [Cohn2001], CommunityUserTopic model [Zhou2006] and LinkPLSALDA [Nallapati2008]. From that time many others have appeared, see e.g. the surveys [Nail2016, Aggarwal2012, Saiyad2016].
Clearly, methods from document network clustering can be adapted for community detection in attributed social networks, however social communities although have similar formal description with document clusters, have inner and more complicated forces to be formed and act. What is more, it has been shown that some methods for community detection in attributed social networks outperform preceding methods for document network clustering. In particular, IncCluster^{1}^{1}1Throughout the text, methods and datasets covered by the survey are written in bold. [Zhou2010] has been shown to outperform kSNAP [Tian2008], PCLDC [Yang2009] to outperform PLSAPHITS [Cohn2001], LDALinkWord [Erosheva2004] and LinkContentFactorization [Zhu2007], CESNA [Yang2013] and ASCD [Qin2018] to outperform BlockLDA [Balasubramanyan2011]. Taking this into account, we do not consider methods focused on document network clustering in the present survey.
1.5. Note on multilayer networks
Generally speaking, we do not aim at considering community detection methods for attributed multilayer networks (see e.g. [Kivela2014]), where different types of vertexes and edges may present at different layers. However, we mention some of such methods from time to time in corresponding remarks. Although nodeattributed singlelayer networks may be considered as a particular case of the multilayer ones (or, generally, featurerich networks [Interdonato2019]), the latter require special analysis to take into account the heterogeneity of attributes, edges and vertices on different layers. A separate survey and an extensive comparable study of such methods is an independent and useful task (see partial overviews e.g. in [Boutemine2017, Interdonato2019, Kivela2014]).
1.6. Note on subspacebased clustering
According to the abovementioned definition of community detection in attributed social networks, we mainly confine ourselves in the survey to the methods that can use the full attribute space and find communities covering the whole network. However, there is a big class of special methods that explore subspaces of attributes and/or find significant subgraphs of the network graph, e.g. GAMer [Gunnemann2010, Gunnemann2014], DBCSC [Gunnemann2011], SSCG [Gunnemann2013], FocusCO [Perozzi20142] and ACM [Wu2018]. The main idea behind the subspacebased (also known as projectionbased) attributed graph clustering is that not all available semantic information is relevant to obtain goodquality communities [Gunnemann20131, GunnemannBoden2013], therefore one has somehow choose the appropriate attribute subspace to avoid the socalled curse of dimensionality (see [Bothorel2015, Section 3.2]) and reveal significant communities that would not be detected if all available attributes were considered.
To be precise, some of the methods that we discuss below partly use this idea, e.g. WCru [Cruz2011, CruzBathorelPoulet2012] (cf. the definition of a point of view in the papers), DVil [Villavialaneix2013], SCMAG [Huang2015], UNCut [Ye2017], DCM [Pool2014], etc., but still can work with the full attribute space. In any case, a separate survey on the subspacebased attributed graph clustering methods would be very a valuable complement to the current survey.
2. Related works and main problems in the area
There is a variety of surveys and comparative studies considering community detection in social networks without attributes, in particular, [SCHAEFFER2007, Yang2016Survey, Fortunato2010, Coscia2011]. In opposite, the survey [Bothorel2015] seems to be the only one on community detection in attributed social networks. Obviously, since it was published in 2015, many new methods adapting different techniques have appeared in the area. Furthermore, a big amount of the methods that had been available before 2015 are not covered by [Bothorel2015], in particular, some based on objective function modification, nonnegative matrix factorisation, probabilistic models, ensembles, etc. In a sense, the techniquebased classification of attributed graph clustering methods in [Bothorel2015] is also sometimes confusing. For example, CODICIL [Ruan2013], a method based on assigning attributeaware weights on graph edges, is not included in [Bothorel2015, Section 3.2. Weight modification according to node attributes], but to [Bothorel2015, Section 3.7. Other methods]. Although [Bothorel2015] is a nice highly cited survey in the area, a recent survey of community detection methods for attributed social networks is clearly required.
Besides [Bothorel2015]
, almost every paper on the topic contains a Related Works section. It typically has a short survey on preceding approaches and an attempt to classify them. We observed that many authors are just partly aware of the corresponding bibliography and this sometimes leads to repetitions in approaches. Furthermore, multiple classifications (usually techniquebased) are mainly not full and even contradictory.
Another big problem in the area is a comparative study of known methods (by means of scalability, complexity and quality). Separate papers provide a limited impact on this (as usually compare their own method with few known ones), see Figures 2 and 3, and the whole picture is unclear. In fact, we are unaware of any comprehensive unified comparison of different attributed graph clustering methods. One more issue, related to the previous one, is that authors use different datasets (of various size and nature) and quality measures to evaluate their methods so that any direct comparison becomes impossible. What is more, datasets and code sources stay unavailable for comparison experiments in the majority of cases.
Facing the abovementioned problems, in the current survey we not only collect the existing methods but also proposed their unified classification based on the moment when topology and semantics of the network are fused and used in the corresponding algorithm. We also focus on the experimental part so that one can see which networks (with the corresponding dataset link) and quality measures are used in each paper and which methods were compared in each study. Besides this, we also provide the reader with a short description of the most influential and interesting methods for community detection in attributed social networks.
The survey covers the papers published in journals and conference proceedings before the middle of 2019. Exceptionally we sometimes note preprints available on arxiv.
3. Classification of community detection methods for attributed social networks
In previous works, the classification of methods for community detection in attributed social networks was done mostly with respect to the techniques used (e.g. distancebased or random walkbased). We partly follow this methodology at a lower level but at the upper level we group the methods by the moment when topology and semantics are used and fused in the method (with respect to the community detection (clusterisation) step), see Figure 4. Namely, we distinguish

early fusion methods that fuse topology and semantics before the clusterisation step,

simultaneous fusion methods that fuse topology and semantics during the clusterisation step,
and 
late fusion methods that fuse topology and semantics after the clusterisation step.
Within each fusion type, we also divide the methods into techniqueused subclasses.
A subclassification that is applied to some subclasses of early fusion methods is by the modification of the initial network topology (structure). In fact, the existing topology may be saved or modified depending on the heuristics used, therefore we distinguish

fixed topology methods that use the existing network topology without modifying it with respect to the semantics,
and 
nonfixed topology methods that modify the existing network topology with respect to the semantics, in particular, add/erase edges and/or vertices.
It is important to distinguish the cases as each one leads to certain advantages or disadvantages. For example, if one assigns edge weights between all nodes in the network, even if there is no structural connections (i.e. considers nonfixed topology) and further removes edges with tiny weights, then in the social network settings this may lead to the following: (a) nodes representing social actors who are highly related in terms of semantics may have vanishing social connections so that the resulting connection may seem unrealistic, (b) one may erase too many important connections. At the same time, the initial “fixed” topological structure may be sparse or noisy in a network and some kind of its enrichment is required. In any case, a proper balance between pure nonfixed and fixed topologies is usually necessary.
As we have already mentioned, the lowest level of classification is by fusion technique. For example, by “weightbased methods” we mean those which form a weighted graph while fusing topology and semantics. Some of the methods further use weighted graph clusterisation algorithms (and this is reasonable) but some may still transform the graph into a distance matrix and use distancebased methods for clusterisation, though are still called “weightbased”. On the other hand, “distancebased methods” are called in this way as produce a distance matrix at the fusion step.
4. Most used attributed social networks and quality measures
4.1. Attributed social networks
It can be observed that “social networks” in many papers mean not only real social networks (like Google+, Facebook, Twitter) but also citation networks (like DBLP and CiteSeer). In fact, citations and blogs are the most popular examples in experiments, while real social networks (say, with friendship connections between users) are not.
By small, medium and large networks we mean those with , and nodes. The most popular datasets used in experiments on community detection in nodeattributed social networks are collected ^{2}^{2}2An interested reader can find other attributed network dataset at Mark Newman page, HPI Information Systems Group, LINQS Statistical Relational Learning Group, Stanford Large Network Dataset Collection, Laboratory of Cell Trafficking and Signal Transduction, University of Verona, Marc Plantevit page, Tore Opsahl page, UCINET networks, Interactive Scientific Network Data Repository, Citation Network Dataset. in Tables 1, 2 or 3. Below, datasets used for evaluation of each method are shown in Dataset columns^{3}^{3}3Recall that if a dataset name is written in bold, its description can be found in Tables 1, 2 or 3. Note also that other versions of the networks from Tables 1, 2 or 3 can be used in fact in different papers, and to show this we mark such datasets by *. For example, a DBLP dataset with the number of nodes and edges different from the described DBLP10K and DBLP84K is denoted by DBLP*.
In most cases, the attributes suitable for the methods discussed in the survey are represented by continuous numerical vectors. If one deals, say, with nominal, textual or graphical attributes, it is common to use TFIDF or other similar frameworks to obtain continuous numerical vectors instead.
Network  Description  Source 

Political Books  All books in this dataset were about U.S. politics published during the 2004 presidential election and sold by Amazon.com. Edges between books means two books are always bought together by customers. Each book has only one attribute termed as political persuasion, with three values: 1) conservative; 2) liberal; and 3) neutrality  Link 
WebKB  A classified network of 877 webpages (nodes) and 1608 hyperlinks (edges) gathered from four different universities Web sites (Cornell, Texas, Washington, and Wisconsin). Each web page is associated with a binary vector, whose elements take the value if the corresponding word from the vocabulary is present in that webpage, and otherwise. The vocabulary consists of 1703 unique words. Nodes are classified into five classes: course, faculty, student, project, or staff. 
Link
[Craven1998] 
A collection of several tweet networks: 1) PoliticsUK dataset is collected from Twitter accounts of 419 Members of Parliament in the United Kingdom in 2012. Each user has 3614dimensional attributes, including a list of words repeated more than 500 times in their tweets. The accounts are assigned to five disjoint communities according to their political affiliation. 2) PoliticsIE dataset is collected from 348 Irish politicians and political organizations, each user has 1047 dimensional attributes. The users are distributed into seven communities. 3) Football dataset contains 248 English Premier League football players active on Twitter which are assigned to 20 disjoint communities, each corresponding to a Premier League club. 4) Olympics dataset contains users of 464 athletes and organizations involved in the London 2012 Summer Olympics. The users are grouped into 28 disjoint communities, corresponding to different Olympic sports.  Link 1 Link 2[Greene2013]  
Lazega  A corporate law partnership in a Northeastern US corporate law firm; possible attributes: (1: partner; 2: associate), office (1: Boston; 2: Hartford; 3: Providence); 71 nodes and 575 edges  [Lazeda2001] 
Research  A research team of employees in a manufacturing company; possible attributes: location (1: Paris; 2: Frankfurt; 3: Warsaw; 4: Geneva), tenure (1: 1–12 months; 2: 13–36 months; 3: 37–60 months; 4: 61+ months); 77 nodes and 2228 edges  [Cross2004] 
Consult  the relationship between employees in a consulting company; possible attributes: organisational level (1: Research Assistant; 2: Junior Consultant; 3: Senior Consultant; 4: Managing Consultant; 5: Partner), gender (1: male; 2: female); 46 nodes and 879 edges  [Cross2004] 
Network  Description  Source 

Political Blogs  A nonclassified network of 1,490 webblogs (nodes) on US politics with 19,090 hyperlinks (edges) between the webblogs. Each node has an attribute describing its political leaning as either liberal or conservative (represented by and ).  Link [Adamic2005] 
DBLP10K 
A nonclassified coauthor network extracted from DBLP Bibliography (four research areas of database, data mining, information retrieval and artificial intelligence) with 10,000 authors (nodes) and their coauthor relationships (edges). Each author is associated with two relevant categorical attributes: prolific and primary topic. For attribute “prolific”, authors with papers are labelled as highly prolific; authors with and papers are labelled as prolific and authors withpapers are labelled as low prolific. Nodeattribute values for “primary topic” (100 research topics) are obtained via topic modelling. Each extracted topic consists of a probability distribution of keywords which are most representative of the topic. 
Link [Zhou2010] 
DBLP84K  A larger nonclassified coauthor network extracted from DBLP Bibliography (15 research areas of database, data mining, information retrieval, artificial intelligence, machine learning, computer vision, networking, multimedia, computer systems, simulation, theory, architecture, natural language processing, humancomputer interaction, and programming language) with 84,170 authors (nodes) and their coauthor relationships (edges). Each author is associated with two relevant categorical attributes: prolific and primary topic, defined in a similar way as in DBLP10. 
Link [Zhou2010] 
Cora 
A classified network of machine learning papers with 2,708 papers (nodes) and 5,429 citations (edges). Each node is attributed with a dimension binary vector indicating the absence/presence of words from the dictionary of words collected from the corpus of papers. The papers are classified into 7 subcategories: casebased reasoning,genetic algorithms, neural networks, probabilistic methods, reinforcement learning, rule learning and theory. 
Link 1 Link 2 [Sen2008] 
CiteSeer  A classified citation network in the field of machine learning with 3,312 papers (nodes) and 4,732 citations (edges). Each node is attributed with a binary vector indicating the absence/presence of the corresponding words from the dictionary of the 3,703 words collected from the corpus of papers. Papers are classified into 6 classes.  Link 1 Link 2 [Sen2008] 
Sinanet  A classified microblog user relationship network extracted from the sinamicroblog website (http://www.weibo.com) with 3,490 users (nodes) and 30,282 relationships (edges). Each node is attributed with 10dimensional numerical attributes describing the interests of the user.  Link [Jia2017] 
PubMed Diabetes  A classified citation networks extracted from the PubMed database pertaining to diabetes. It contains 19,717 publications (nodes) and 44,338 citations (edges). Each node is attributed by a TFIDF weighted word vector from a dictionary that consists of 500 unique words.  Link 
Facebook100  A nonclassified Facebook users network with 6,386 users (nodes) and 435,324 friendships (edges). The network is gathered from Facebook users of 100 colleges and universities (e.g. Caltech, Princeton, Georgetown and UNC Chapel Hill) in September 2005. Each user has the following attributes: ID, a student/faculty status flag, gender, major, second major/minor (if applicable), dormitory(house), year and high school.  Link [Traud2012, Traud2011] 
egoFacebook  Dataset consists of ’circles’ (’friends lists’) from Facebook with 4039 nodes and 88234 edges. Facebook data was collected from survey participants using a Facebook app. The dataset includes node features (profiles), circles, and ego networks.  Link [Leskovec2012] 
LastFM  A network gathered from the online music system Last.fm with 1,892 users (nodes) and 12,717 friendships on Last.fm (edges). Each node has 11,946dimensional attributes, including a list of most listened music artists, and tag assignments.  Link 
Delicious  A network of 1,861 nodes, 7,664 edges and 1,350 attributes. This is a publicly available dataset from the HetRec 2011 workshop that has been obtained from the Delicious social bookmarking system. Its users are connected in a social network generated from Delicious mutual fan relations. Each user has bookmarks, tag assignments, that is, [user, tag, bookmark] tuples, and contact relations within the social network. The tag assignments were transformed to attribute data by taking all tags that a user ever assigned to any bookmark and assigning those to the user.  Link 
Wiki  A network with nodes as web pages. The link among different nodes is the hyperlink in the web page. 2,405 nodes, 12,761 edges, 4,973 attributes, 17 labels  Link 
egoTwitter  This dataset consists of ’circles’ (or ’lists’) from Twitter. Twitter data was crawled from public sources. The dataset includes node features (profiles), circles, and ego networks. Nodes 81306, Edges 1768149  Link [Leskovec2012] 
Network  Description  Source 

Flickr  A network with 100,267 nodes, 3,781,947 edges and 16,215 attributes collected from the internal database of the popular Flickr photo sharing platform. The social network is defined by the contact relation of Flickr. Two vertices are connected with an undirected edge if at least one undirected edge exists between them. Each user has a list of tags associated that he/she used at least five times. Tags are limited to those used by at least 50 users. Users are limited to those having a vocabulary of more than 100 and less than 5,000 tags.  [Ruan2013] A version of the dataset 
Patents  A patent citation network with vertices representing patents and edges depicting the citations between. A subgraph containing all the patents from the year 1988 to 1999. Each patent has six attributes, grant year, number of claims, technological category, technological subcategory, assignee type, and main patent class. There are 1,174,908 vertices and 4,967,216 edges in the network.  Link Larger dataset 
egoG+  This dataset consists of ’circles’ from Google+. Google+ data was collected from users who had manually shared their circles using the ’share circle’ feature. The dataset includes node features (profiles), circles, and ego networks. Nodes 107,614, Edges 13,673,453. Each node has four features: job title, current place, university, and workplace. A userpair(edge) is compared using knowledge graphs based on, Category: Occupations, Category:Companies by country and industry, Category: Countries, Category:Universities and colleges by country. 
link [Leskovec2012] 
4.2. Measures for community detection quality
Given the set of detected communities (overlapping or not), one needs to evaluate the quality of the communities. There are two possible options depending on the network under consideration. If the network has no ground truth, one can measure structural closeness and attribute homogeneity directly. According to our observations, the most popular quality measures in this case are Modularity and Density for the former and Entropy for the latter. Many others such as Conductance, Within Cluster Sum of Squares, Intracluster distance, etc., are also possible. If there is ground truth, it is sometimes reasonable to compare the detected communities with the known ones. This can be done, for instance, with the following popular measures: Accuracy, Normalised Mutual Information (denoted below by NMI), Adjusted Rand Index or Rand Index (denoted below by ARI and RI, correspondingly) and measure.
Due to space limitations, we refer the reader to the comprehensive survey [Chakraborty:2017:MCA:3135069.3091106] and to [Bothorel2015, Sections 2.2 and 4]
, where all the abovementioned evaluation metrics and many others are precisely defined and discussed in detail.
5. Early fusion methods
These methods aim to fuse topological and semantic information before the clusterisation step so that the data obtained at the fusion step is suitable for conventional clusterisation methods.
5.1. Weightbased methods
The main characteristic of these methods is that the semantics is used to assign weights on edges of the network graph (the topology may be fixed or nonfixed), see Figure 10, so that the resulting weighted graph can be further clustered e.g. by a clusterisation algorithm for weighted graphs such as Weighted Louvain [Blondel2008] (requiring the adjacency matrix for the weighted graph as an input). There are also several algorithms that still find the distance matrix for and apply distancebased clusterisation algorithms such as means and medoids. In other words, weightbased methods remove attribute information by storing it inside the structure, namely, on the edges of the graph.
The weights are usually assigned on edges as follows:
(5.1) 
where and are chosen topological similarity function and semantic similarity function for nodes and , respectively. The parameter controls the balance between topological and semantic components so that corresponds to the pure topological case and to the semantic one. Generally speaking, one may introduce a nonlinear fusing function instead of (5.1), however such a choice clearly complicates the model and thus requires reasonable justification.
The fixed topology assumes that only existing edges are assigned with the weight (5.1), while the nonfixed one assigns weights on the edges of the complete graph based on . In the former case, usually for that did not exist in the initial graph , and is thus generated only by the semantic similarity component.
A very popular approach in fixedtopology case is assuming in (5.1), see Table 4. This effect may be also achieved by assuming for all . Actually this means that the weights in are based only on the semantic similarity. Clearly, this may lead to the dominance of the semantics component and the break of initial structural connection between nodes with dissimilar attributes.
The approach with in (5.1), see Table 5, seems more adequate with respect to as explicitly allows controlling the impact of both components. For unweighted graphs , it is common to put if edge exist in , and otherwise.
As for , there are several popular measures. Assume that we are given two attribute vectors and . One can define basing on

the (normalised) matching coefficient
(5.2) 
(5.3) 
Jaccard similarity coefficient
(5.4) where and are thought as sets in the former case and as vectors with nonnegative real values in the latter one,

Minkowski similarity
(5.5) where one gets the city block norm in the denominator if and the Euclidean norm if .
The choice of is usually unclear and is determined by author’s preferences. Moreover, we are unaware of any systematic comparison of the abovementioned measures for semantic similarity.
Algorithm  Input for / Method of Clusterisation  Number of clusters as input/ Clusters overlap  Network size  Evaluation  Topology  Databases  Other attributed network clusterisation algorithms compared with 
WNev [Neville2003] 
Weighted graph
MinCut [Karger1993] MajorClust [SteinNiggemann1999] Spectral [ShiMalik2000] 
No/No  Small  Accuracy  Fixed  Synthetic  — 
WSte1 [Steinhaeuser] 
Weighted graph
Threshold 
No/No  Large  Modularity  Fixed  Phone Network [Madey2007]  — 
WSte2 [Steinhaeuser2010] 
Similarity matrix (via Weighted graph and random walks)
Hierarchical clustering [Johnson1967, Fred2002] 
No/No  Large  Modularity  Fixed  Phone Network [Madey2007]  — 
WCom1 [Combe2012] 
Weighted graph
Weighted Louvain [Blondel2008] 
Yes/No  Small  Accuracy  Fixed  DBLP* 
WCom2 [Combe2012]
DCom [Combe2012] 
WCom2 [Combe2012] 
Distance matrix (via weighted graph)
Hierarchical agglomerative clustering 
Yes/No  Small  Accuracy  Fixed  DBLP* 
WCom1 [Combe2012]
DCom [Combe2012] 
AACluster [Akbas2017, Akbas2019] 
Node embeddings (via weighted graph)
medoids 
Yes/No 
Small
Medium Large 
Density
Entropy 
Fixed 
Political Blogs
DBLP* Patents* Synthetic 
SACluster [Zhou2009]
BAGC [Xu2014] CPIP [Liu2015] 
PWMAMILP [Alinezhad2019] 
Weighted graph
Linear programming MILP [Alinezhad2019] 
No/No  Small 
RI
NMI 
Fixed  WebKB  — 
KDComm [Bhatt2019] 
Weighted graph
Iterative Weighted Louvain 
No/No 
Small
Medium Large 
measure
Jaccard measure Rank Entropy measure 
Fixed 
egoG+
Twitter* DBLP* [Jia2017] Reddit link 
CPIP [Liu2015]
JCDC [Zhang2016] UNCut [Ye2017] SI [Newman2015] 
Algorithm  in (5.1)  Input for / Method of Clusterisation  Number of clusters as input/ Clusters overlap  Network size  Evaluation  Topology  Databases  Other attributed network clusterisation algorithms compared with 
WWan [Wang2010] 
in theory
in experiments 
Edge similarity matrix (via weighted graph)
EdgeCluster [Tang2009] (means variant) 
Yes  Small 
NMI
MicroF1 MacroF1 
Nonfixed: removing edges 
Synthetic
BlogCatalog Delicious 
Nonoverlapping coclustering [Dhillon2003] 
SAC2 [DangViennet2012] 
NN (unweighted) graph (via weighted graph)
(Unweighted) Louvain [Blondel2008] 
No/ No 
Small
Medium 
Density
Entropy 
Nonfixed: removing edges 
Political Blogs
Facebook100 DBLP10K 
SAC1 [DangViennet2012]
WSte2 [Steinhaeuser2010] Fast greedy [Clauset2004] for weighted graph 

WCru [Cruz2011, CruzBathorelPoulet2012] 
in theory
Not specified in experiments 
Weighted graph
Weighted Louvain [Blondel2008] 
No  Medium 
Modularity
Intracluster distance 
Fixed  —  
CODICIL [Ruan2013] 
in theory
in some experiments 
Weighted graph
Metis [Karypis1998] Markov Clustering [Satuluri2009] 
No 
Small
Medium Large 
measure  Nonfixed: adding and removing edges 
CiteSeer*
Flickr* Wikipedia* 
IncCluster [Zhou2010]
PCLDC [Yang2009] LinkPLSALDA [Nallapati2008] 
WMen [Meng2018]  Not specified 
Weighted graph/Distance matrix for the weighted graph
SLPA [Xie2012] Weighted Louvain [Blondel2008] Kmedoids [Yu2018] 
YesNo/
YesNo 
Small
Medium 
NMI
measure Accuracy 
Fixed 
Lazega
Research Consult LFR benchmark [Lancichinetti2008] 
CODICIL [Ruan2013]
SACluster [Zhou2009] 
PLCAMILP [Alinezhad2019] 
Weighted graph
Linear programming MILP [Alinezhad2019] 
No/No  Small 
RI
NMI 
Nonfixed: adding and removing edges  WebKB 
SCD [Li2017]
ASCD [Qin2018] SCI [Wang2016] PCLDC [Yang2009] BlockLDA [Balasubramanyan2011] 

kNNenhance [Jia2017]  May be thought as , NN by semantics 
Distance matrix (of the augmented graph)
NN means 
No/No  Medium 
Accuracy
NMI FMeasure Modularity Entropy 
Nonfixed: adding edges 
Cora
Citeseer Sinanet PubMed Diabetes DBLP* 
PCLDC [Yang2009]
PPLDC [Yang2010] PPSBDC [Chai2013] CESNA [Yang2013] cohsMix [Zanghi2010] BAGC [Xu2012] GBAGC [Xu2014]) SACuster [Zhou2009] IncCluster [Zhou2010] CODICIL [Ruan2013] GLFM [Li2011]) 
IGCCSM [Nawaz2015] source 
in theory
in comparison experiments 
Distance matrix for the weighted graph
Medoids 
Yes/ No  Medium 
Density
Entropy 
Fixed  Political Blogs
DBLP10K 
SACluster [Zhou2009]
SAClusterOpt [Cheng2011] 
AGPFC [He2019]  in theory, manually tuned in experiments 
Fuzzy equivalent matrix
cut set method 
No/Yes 
Small
Medium 
Density
Entropy 
Fixed 
Political Blogs
CiteSeer Cora WebKB 
SACluster [Zhou2009]
BAGC [Xu2012] 
NMLPA [Huang2019] 
Weighted graph
A multilabel propagation algorithm 
Yes/ Yes  Medium 
score
Jaccard Similarity 
Fixed 
egoFacebook
Flickr* [Ruan2013] egoTwitter 
CESNA [Yang2013]
SCI [Wang2016] CDE [Li2018] 
Now let us describe the most influential weightbased methods CODICIL [Ruan2013] and SAC2 [DangViennet2012], according to Figure 3.
5.1.1. Codicil
The method CODICIL [Ruan2013] assigns semantic weights between all the nodes in , i.e. employs the nonfixed topology scheme. To decrease complexity, nodes with highest cosine similarity values with are selected as the top neighbours of so that the semantic similarity for and is essentially (5.3). The topological similarity weight for two nodes is defined through the relative overlap of their respective structural neighbours. Approximations of (5.3) and (5.4) are used for this purpose. Then the topological and semantic weights are combined similar to (5.1). After that, a biased edge sampling procedure that retains edges being locally relevant to each node is applied (in other words, the edges with highest similarity values are retained) to make the weighted graph sparse and enable both better runtime performance and lower memory usage in the subsequent community detection step that is performed by Metis [Karypis1998] or Multilayer Regularised Markov Clustering [VanDongen2000, Satuluri2009]. The complexity of CODICIL is .
5.1.2. Sac2
The method SAC2 [DangViennet2012] uses (5.2) as for discrete attributes and (5.5) with if they are continuous. Textual ones are first transformed into numeric values by TFIDF procedure. Furthermore, the corresponding is (5.3) or (5.4). The obtained are then used to assign weights (5.1), where if and are directly connected and , otherwise. After that, the weight is used to construct an (unweighted) nearest neighbour graph as a directed graph in which each node has exactly edges, connecting to its most similar neighbours in (thus the topology is nonfixed). The parameter is set to equal to the average node degree in . The version [Dong2011] of NN algorithm with the empirical cost is applied to reduce complexity. At the community detection step, Louvain algorithm [Blondel2008] is applied to find communities in .
Remark 1.
The authors of [Berlingerio2011] consider attributed multilayer networks with different types of edges and use a similarity measure similar to (5.2) to flatten the network and put corresponding weights on singlelayer network. After this fusion, any weighted graph clustering algorithm, e.g. Weighted Louvain [Blondel2008], is actually suitable for community detection. In [Papadopoulos2015], the method called CAMIR is proposed for clustering attributed multilayer
networks and assigns different weights to each attribute and edgetype. In particular, it ranks vertex properties by exploiting the information from edgetypes and attributes and further constructs a unified similarity matrix (taking into account all edge types and attributes). The clusterisation step is performed via spectral clustering.
Remark 2.
SANS [Parimala2015] works with weighted directed graphs (that are out of scope of the present survey) using the matching coefficient (5.2) for semantic similarity and the socalled Weight Index (the sum of weights of incoming and outgoing edges) for topological similarity in a version of (5.1). SANS automatically determines the number of clusters via centroids and use the threshold algorithm for clustering.
Remark 3.
Edge weighting similar to (5.1) is also applied in FocusCO [Perozzi20142]
. Although it is not a purely unsupervised clustering approach (it requires user’s preferences on focus attributes), it allows to solve simultaneously two interesting problems: the extraction of focused local clusters and the detection of outliers in an attributed network.
Remark 4.
Let us mention that there exist approaches similar ideologically (attributes edge weights embeddings means) but preceding to the recent algorithm AACluster [Akbas2017, Akbas2019]. For example, for a given network with numerical vector attributesrs GraphEncoder [Tian2014] and GraRep [Cao2015] first obtain edge weights (5.1) with and being the cosine similarity (5.3
) and then apply different techniques (sparse autoencoder in
[Tian2014] and matrix factorization in [Cao2015] partly based on skipgram [Mikolov2013] and DeepWalk [Perozzi2014] ideas) to obtain embeddings (lowdimensional vector representations) for the nodes of the weighted graph. The resulting embeddings are further fed to means algorithm to detect communities. However, in opposite to [Akbas2017, Akbas2019], [Tian2014] and [Cao2015] mostly focus on embedding techniques suitable for a weighted graph and consider their different applications e.g. to classification and visualisation.5.2. Distancebased methods
Algorithm  in (5.6)  Input for / Method of clusterisation  Number of clusters as input/Clusters overlap  Network size  Evaluation  Topology  Databases  Community detection methods for attributed graphs compared with 
DCom [Combe2012] 
Distance matrix
Hierarchical agglomerative clustering 
Yes/No  Small  Accuracy  Nonfixed: added edges  DBLP* 
WCom1 [Combe2012]
WCom2 [Combe2012] 

DVil [Villavialaneix2013, Olteanu2013] 
Distance (or similarity) matrix
Stochastic kernel SOM algorithm [Villavialaneix2013, Olteanu2013] 
No/No 
Small
Medium 
NMI  Nonfixed: added edges 
Synthetic
Medieval Notarial Deeds 
—  
SToC[Baroni2017]  Formally but controlled via and 
Distance matrix
close clustering [Baroni2017] 
No/No 
Medium
Large 
Modularity
WithinCluster Sum of Squares 
Nonfixed: added edges 
DBLP10K
DIRECTORS* DIRECTORSgcc* 
IncCluster [Zhou2010]
GBAGC [Xu2014] 
@NetGA [Pizzuti2018] 
in general
in experiments 
Distance matrix
Genetic algorithm 
No/No  Medium  NMI  Nonfixed: added edges  Synthetic 
SACluster [Zhou2009]
CSPA [Strehl2003, Elhadi2013]) Selection [Elhadi2013] 
ANCA [FalihGrozavu2018, Falih2018] 
Maybe thought as for summing eigenvectors of distance and similarity matrices 
Distance and similarity matrices
means for the sum of eigenvectors of the distance and similarity matrices 
Yes/No  Medium 
Adjusted Rand Index
NMI Density Modularity Conductance Entropy 
Fixed 
Synthetic
DBLP10K Anonymized Enron email corpus 
SACluster [Zhou2009]
SAC1SAC2 [DangViennet2012] IGCCSM [Nawaz2015] WSte1 [Steinhaeuser] ILouvain [Combe2015]. 
Methods considered in the precious subsection exchange the node attributes for edge weights so that one obtains a weighted graph with semantic information incorporated. Thus the topology of the network is somehow saved at the fusion step. Methods from this subsection intentionally remove the network so that the topological and semantic information is fused by a distance function between nodes and stored in a distance matrix, see Figure 6. Distancebased clusterisation methods such as means and medoids then can be applied. The user of such methods has to be aware of that in general the resulting clusters may contain disconnected portions of the initial graph as the graph structure is removed at the fusion step [Akbas2017, Section 3.3].
The usual form of the distance fusion function is
(5.6) 
where and is a topological distance function and a semantic distance function for nodes and , correspondingly. Clearly, one can introduce a more complicated fusion function based on distances. The parameter influences the balance between topological and semantic information so that corresponds to the pure topological case and to the semantic one. It is common to define as short path length distance between and . The possible options for are as follows if we are given attribute vectors and :

Jaccard distance
(5.7) where and are thought to be sets in the former case and vectors with nonnegative real values in the latter one,

Minkowski distance
(5.8) where one gets the city block norm if and the Euclidean norm if .
The distancebased methods are summarised in Table 6. Note that ANCA [FalihGrozavu2018, Falih2018] employs a bit different approach than in (5.6) but nevertheless still deals with distance matrices (with respect to certain chosen seednodes).
There are no highly influential methods among the distancebased ones according to Figure 3, so we are going to describe the one most interesting to us.
5.2.1. DVil
In DVil [Villavialaneix2013], (5.6) with and being different kernel functions is used to combine semantic information of different types (graph, numerical variables, factors, textual variables, etc.) and network topology between all the nodes in the network. In one of the experiments, is the shortest path length between two nodes. What is more, the topology is nonfixed there. The obtained distance matrix is then used in a stochastic kernel SOM^{4}^{4}4Throughout the text, SOM stand for selforganising maps. algorithm at the community detection step. The usage of SOM allows to simultaneously solve the problem of visualisation by projecting the nodes onto a grid of small dimension. The method DVil [Villavialaneix2013] is later developed in [Olteanu2013], where the balance between topological and semantic information is tuned automatically.
5.2.2. SToC
SemanticTopological Clustering SToC [Baroni2017] has time complexity , where and are the number of nodes and edges in the network, respectively. SToC uses a fusing function different from (5.6), namely,
(5.9) 
One can formally think that in this scheme, however the impact of the semantic and topological components is still controlled by the parameters involved in and (see below).
The topological distance in SToC is defined via (5.7) and the notion of neighbourhood:
where the neighbourhood of is the set of nodes reachable from with a path of length at most (being a parameter), see [Gunnemann2011]. To reduce complexity, the Jaccard distance in is approximated with a bounded error (being a one more parameter) by bottom sketch vectors [Cohen2007], i.e. compressed representations of neighbourhood in this case. The semantic distance for quantitative attributes (normalised to ) is calculated using the Euclidean distance (5.8), and for categorical attributes using the Jaccard distance (5.7). The resulting distance as in (5.9) is defined to be in .
Using , a cluster is defined by considering nodes that are within a maximum distance from a given node. Namely, for a given threshold , a close cluster is a subset of the nodes in such that there exists a node such that for all , . A close clustering of is defined as a partition of its nodes into close clusters. At the clusterisation step, SToC iteratively extracts close clusters from starting from random seeds (chosen through a select node function) by partial traversal of . Take into account that is contained in the set of nodes such that . Nodes assigned to a cluster are not used in further iterations, thus the clusters formed are not overlapping. Moreover, the approach does not require the number of clusters as input.
As the choice of the parameters and in SToC can be nontrivial, the authors propose an autotuning procedure. It computes optimal and via approximating the cumulative distribution of and , taking into account parameters and , provided by the user and controlling the importance of semantic and topological component, respectively.
Remark 5.
There exist distancebased methods for multilayer networks. For example, CLAMP (CLustering Attributed MultigraPhs) [Papadopoulos2017] is an approach for clustering attributed networks with heterogeneous (numerical and categorical) attributes and multiple types of edges that uses a unified distance measure similar to (5.6), in a sense. The distance measure takes into account the importance of the node properties and the balance between the sets of attributes and edges, by assigning different weight to each of them. The clustering process adopts the gradient descent to produce fussy clusters. It is also worth mentioning that CLAMP is highly parallelisable.
Algorithm  Graph augmentation  Input for / Method of clusterisation  Number of clusters as input/Clusters overlap  Network size  Evaluation  Topology  Databases  Community detection methods for attributed graphs compared with 
SACluster [Zhou2009]
IncCluster [Zhou2010, Cheng2012] SAClusterOpt [Cheng2011] 
Semantic vertexes and structuresemantics edges 
Distance matrix (via neighbourhood random walks)
Modified medoids [Zhou2009] 
Yes/No 
Small
Medium 
Density
Entropy 
Nonfixed: adding edges  Political Blogs
DBLP10K DBLP84K 
WCluster [Zhou2009] (based on (5.6))
SACluster [Zhou2009] IncCluster [Zhou2010, Cheng2012] SAClusterOpt 
SCMAG [Huang2015]  Semantic vertexes and structuresemantics edges 
Distance matrix (via neighbourhood random walks)
Subspace clustering algorithm based on ENCLUS [Cheng1999] 
No/Yes  Medium 
Density
Entropy 
Nonfixed: adding edges 
IMDB
Arnetminer bibliography 
SACuster [Zhou2009]
GAMer [Gunnemann2014] 

5.3. Nodeaugmented graph distancebased methods
Methods from this class transform the initial graph into another nodeaugmented graph with new semantic nodes representing distinct node attributes, see Figure 7 and Table 7. Edges between structural and semantic nodes are added according to the node attributes in (thus the topology is nonfixed). Take into account that the resulting graph is much larger than (especially if the dimension and the sets of possible attribute values of node attributes are large) and this extremely increases the time complexity of the methods.
According to Figure 3, SACluster family is one of the most influential methods for community detection in attributed graphs and therefore we now give a short description of it.
5.3.1. SACluster
The method SACluster [Zhou2009] transforms into with new attribute nodes representing distinct node attribute values. Namely, an attribute node represents an attributevalue pair . If for , then an attribute edge is added between and (this however cannot be applied to continuous attributes). In , two vertices are close if they are connected through many structural and/or attribute edges. The neighbourhood random walk model is further used in SACluster
to estimate the node closeness in
.To proceed, we recall several definitions from [Zhou2009]. Let be the (onestep) transition probability matrix of a graph. Given as the length of a random walk, as the restart probability, the neighbourhood random walk distance from to from the graph is defined as
where is a path from to whose length is with transition probability . Moreover,
where is the neighbourhood random walk distance matrix. One can measure then the closeness between vertices and as
(5.10) 
If , the neighbourhood random walk is the same as the random walk with restart defined in [Tong2006].
To combine the structural closeness and attribute similarity in , [Zhou2009] constructs the transition probability matrix of the graph and computes the corresponding distances (5.10). Notice that at this step weights on edges in are assigned: topological edge has a weight of , semantic edges corresponding to have an edge weight of , respectively.
The resulting distance matrix, based on (5.10) for , is then fed in a medoids type clustering algorithm. First, good initial centroids from the density point of view [Hinneburg1998] are chosen. Furthermore, a converging iterative process for the optimisation of an objective function (in order to maximize intracluster similarity and minimize intercluster similarity) is performed, with the corresponding adjustments of the edge weights .
As expected from the construction of , SACluster is computationally expensive, namely, its time complexity is . In order to improve the efficiency and scalability of SACluster, the methods IncCluster [Zhou2010, Cheng2012] and SAClusterOpt [Cheng2011] have been proposed. The main idea behind them is to reduce the number and the complexity of random walk distance computations.
5.4. Embeddingbased (early fusion) methods
As is wellknown, a graph as a traditional representation of a network brings several difficulties to network analysis. As mentioned in [Cui2019], graph algorithms suffer from high computational complexity, low parallelisability and inapplicability of machine learning methods. Novel network embedding techniques aim to tackle this by learning lowdimensional continuous vector representations (also known as embeddings) for the network nodes so that main network information is efficiently encoded^{5}^{5}5The embedding approach is an algorithmic framework for learning continuous feature representations for nodes in networks, initially proposed as node2vec [Grover2016]. Node2vec learns a mapping of nodes to lowdimensional space of features by maximizing the likelihood of preserving network neighbourhoods of nodes. As a result, embeddings reflect the structural equivalence or homophily between network nodes [Grover2016].. Additionally, the embeddings not only aim at reconstructing the initial network but also at supporting network inference such as predicting links, classification and clustering nodes (for more details, see [Cui2019, Cai2018]).
In the context of nodeattributed social networks, the objective of network embedding is efficient lowdimensional encoding and combining both the network topology and semantics preserving proximities of different orders [Tang2015, Cao2015, Gao2018]. Having an embedding representation for the nodes, one can theoretically use traditional distancebased clusterisation methods such as means and medoids to further tackle the clusterisation problem, see Table 8.
Undoubtedly, there exists a rich bibliography on embedding techniques for networks with side information (node and edgeattributed, heterogeneous in node and edge types) [Cui2019, Cai2018]
but in fact not all of them are reliable for the community detection task. It is worth mentioning that the task of classification (i.e. a supervised learning task) is typically considered. At the same time, some authors use embedding techniques for clusterisation in performance experiments that have been used only for classification in the original papers, e.g. in
[Gao2018] the comparison is between attributed network embedding methods include TADW [Yang2015], LANE [Huang2017b], GAE [Kipf2016], VGAE [Kipf2016], and GraphSAGE [Hamilton2017]. Taking all these fact into account, we confine ourselves in this survey only to the methods that work with nodeattributed social or citation networks, have been applied to community detection and compared with other clusterisation methods.Algorithm  Embeddings  Input for / Method of clusterisation  Number of clusters as input/Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
PLANE [Le2014]  Via a generative model and EM [Dempster1977] 
Node embeddings
means 
Yes/No 
Small
Medium 
Accuracy  Cora*  Relational Topic Model [Chang2009]+Topic Distributions Embedding [Iwata2007] 
DANE [Gao2018]  Autoencoder 
Node Embeddings
means 
Yes/No  Medium  Accuracy  Cora
Citeseer PubMed Diabetes Wiki 
Embeddings obtained via TADW [Yang2015]
LANE [Huang2017b] GAE [Kipf2016] VGAE [Kipf2016] GraphSAGE [Hamilton2017] 
CDE [Li2018]  Topology embedding matrix 
Topology embedding matrix and attribute matrix
Nonnegative matrix factorisation 
Yes/(Yes/No) 
Small
Medium 
Accuracy
NMI Jaccard similarity F1score 
Cora
Citeseer WebKB Flickr* Philosophers [Hunter2004] egoFacebook 
PCLDC [Yang2009]
Circles [Leskovec2012] CESNA [Yang2013] SCI [Wang2016] 
MGAE [Wang2017]  Autoencoder 
Node embeddings
Spectral clustering 
Yes/No  Medium 
Accuracy
NMI score Precision Recall Average Entropy Adjusted Rand Index 
Cora
CiteSeer Wiki 
Circles [Leskovec2012]
RTM [Chang2009] RMSC [Xia2014] Embeddings obtained via TADW [Yang2015] VGAE [Kipf2016] 
There are no highly influential methods among the embeddingbased ones according to Figure 3 but we provide a short description of each one from Table 8 due to the novelty and importance of embedding techniques in the clusterisation task.
5.4.1. Plane
Probabilistic LAtent Document Network Embedding PLANE [Le2014] is a topicbased embedding method that aims to combine the following representations of each node with text attributes (e.g. in a citation network): the highdimensional representations based on word occurrences and network topology, the representation in terms of a topic distribution (based on the Relational Topic Model [Chang2009]
) and the lowdimension representation for nodes. The representations are joint through a generative model, with the estimation of the parameters (including the corresponding node embeddings) via the maximum a posteriori estimation with EM algorithm
[Dempster1977]. It is interesting that not only observed positive links are incorporated but also virtual negative ones. For each node, the authors form a dimensional embedding to simultaneously solve the visualisation problem. To perform community detection, the embeddings obtained are fed to means.5.4.2. Dane
Deep Attributed Network Embedding DANE [Gao2018] is an embeddingbased algorithm using a deep model to preserve the firstorder, highorder and semantic proximities in the attributed network. There are two branches composed of a multilayer nonlinear function and capturing the network topology and semantics with further mapping them into a lowdimensional space. Each branch is an autoencoder, i.e. an unsupervised deep model widely used in machine learning [Jiang2016]
. The autoencoders aim to minimize the reconstruction loss between the input vectors and the output embeddings to preserve the abovementioned proximities. At that, the consistency and complementary of topology and semantics are preserved simultaneously at some point in order to obtain better structureattribute fusion. Note that the loss function exploits an efficient most negative sampling strategy (with complexity
). The resulting output is the concatenation of the embeddings obtained by each branch. Typically for the methods from this class, community detection is performed by means on the embeddings.5.4.3. Cde
The method of CDE (Community Detection in attributed graphs: an Embedding approach) [Li2018] uses a special function to measure community membership similarity. Its values further are input for a procedure based on skipgram with negative sampling [Mikolov2013] to obtain a community structure embedding matrix that encodes the latent denselyconnected subgraphs and explore inherent community structures. After this, the embedding matrix is used instead of the adjacency matrix for the network. Having the structure embedding matrix and attribute matrix at hand, the actual community detection is further performed via a nonnegative matrix factorization procedure (with a unified topology and semanticsaware objective function) that optimizes community membership with suitable iterative updating rules based on MajorizationMinimization framework [Hunter2004] (cf. [Wang2016]). The impact of topology and semantics may be varied. The resulting communities (the number is an input) may overlap or not depending on the community membership rule chosen.
5.4.4. Mgae
Marginalized Graph Autoencoder for Graph Clustering MGAE [Wang2017] takes an attributed graph as input and learns a topology and semantics with an augmented autoencoder upon them, with the graph convolutional network as a base. The authors propose to corrupt the semantics with noise with further marginalization in order to obtain a better representation from the autoencoder. By stacking multiple layers of the autoencoder, MGAE results in a deep representation for network nodes that is later fed into the spectral clustering algorithm.
Remark 6.
Community detection in heterogeneous and multilayer networks is considered e.g. in [Chang2015, Huang2017, Pei2018]. Other embedding approaches for different heterogeneous networks (in particular, nodeattributed) which are used mostly for classification but theoretically can be applied for clusterisation are discussed e.g. in the comprehensive surveys [Cui2019, Cai2018].
Algorithm  Attribute types  Patterns  Number of clusters as input/Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
AHMotif [Li2018Motif] 
Binary
Numerical 
Motif  Yes/No  Medium 
NMI
Accuracy 
Cora
WebKB 
— 
5.5. Pattern miningbased (early fusion) methods
Recall that a motif is a pattern of the interconnection occurring in realworld networks at numbers that are significantly higher than those in random networks [Milo2002] (note that a spanning tree pattern and a clique are representatives of motifs). Motifs are considered as building blocks for complex networks [Milo2002], and they may help to uncover useful information hidden in the network topology and semantics. We found just one community detection method for attributed social networks based on this idea, namely, AHMotif (Attribute Homogenous Motifbased method) [Li2018Motif], see Table 9. This method equips structural motifs identified for the network with the socalled homogeneity value based on attributes of the nodes involved in the motif. This information is then stored in a special adjacency matrix. Subsequently, the matrix is the input to the existing community detection algorithms such as Permanence [Chakraborty2014] and Affinity Propagation [Frey2007].
6. Simultaneous fusion methods
Oppositely to the early fusion methods, the simultaneous fusion ones use and fuse topology and semantics in a unified process with community detection. Some of them are based on modifications of known clusterisation algorithms such as Louvain, Normalised Cut, means, medoids and , or attributeaware adaptations of heuristic approaches such as evolutionary and genetic algorithms. A big subclass of simultaneous fusion methods use nonnegative matrix factorisation framework to detect communities in attributed social networks, while another subclass — generative probabilistic models — aim to statistically infer a model of the attributed network under assumption that topology and semantics are generated accordingly to some parametric distributions.
Algorithm  Modified method  Number of clusters as input/ Clusters overlap  Network size  Evaluation  Databases  Other attributed network clusterisation methods compared with 
OCru [Cruz2011] 
Louvain [Blondel2008]
Added attribute Entropy minimisation 
No/No  Medium 
Modularity
Entropy 
Facebook100  — 
SAC1 [DangViennet2012] 
Louvain [Blondel2008]
Added attribute similarity maximisation 
No/ No 
Small
Medium 
Density
Entropy 
Political Blogs
Facebook100 DBLP10K 
SAC2 [DangViennet2012]
WSte2 [Steinhaeuser2010] Fast greedy [Clauset2004] for weighted graph 
ILouvain [Combe2015] (code) 
Louvain [Blondel2008]
Added maximisation of attributebased measure Inertia 
No/ No 
Small
Medium 
NMI
Accuracy 
DBLP+Microsoft Academic Search
Synthetic 
ToTeM [Combe2012]^{6}^{6}6The authors claim that they compare ILouvain with ToTeM [Combe2012], “another community detection method designed for attributed graphs which exploits the two types of information”. However, it seems that there is an inaccuracy with it as [Combe2012] does not contain any method called ToTeM. 
LAA/LOA [Asim2017] 
Louvain [Blondel2008]
Modularity gain depends on attributes 
No/No  Small 
Density
Modularity 
London gang [Grund2015]
Italy gang Polbooks Adjnoun [Newman2006] Football [Girvan2002] 
— 
UNCut [Ye2017] 
Normalised Cut
Added attribute homogenuityaware measure Unimodality Compactness 
Yes/No 
Small
Medium 
NMI
ARI 
Disney [Muller2013]
DFB [Gunnemann2013] ARXIV [Gunnemann2013] Political Blogs 4area [Perozzi20142] Patents 
SAcluster [Zhou2009]
SSCG [Gunnemann2013] NNM [Shiga2007] 
DAEGC [Wang2019] 
Graph attention network [Velickovic2018]+ means for node embeddings+Stochastic Gradient Descent 
Yes/No  Medium 
ACC
NMI measure ARI 
Cora
Citeseer Pubmed 
RMSC [Xia2014]
TADW [Yang2015] +means VGAE and GAE[Kipf2016] +means 
NetScan [Ester2006, Ge2008]  An approximation algorithm for the connected Center optimization problem  Yes/Yes 
Small
Medium 
Accuracy 
Professors*
Synthetic DBLP* BioGRID+Spellman 
— 
JointClust [Moser2007]  An approximation algorithm for the Connected X Clusters problem  No/No  Medium  Accuracy 
DBLP*
CiteSeer* Corel stock photo collection 
— 
MAM [Sanchez2015] (code)  Louvaintype algorithm with attributeaware Modularity+Outlier detection 
No/No 
Small
Medium Large 
F1score
Attributeaware Modularity 
Synthetic
Disney [Muller2013] DFB [Gunnemann2013] ARXIV [Gunnemann2013] IMDB [Gunnemann2013] DBLP* Patents* Amazon [Sanchez2013] 
CODA [Gao2010] 
SSCluster [Farzi2018]  Medoid based clustering algorithm with structural and attribute objective functions  Yes/No  Medium 
Density
Entropy 
Political Blogs
DBLP10K 
SAcluster [Zhou2009, Cheng2011]
Wcluster [Cheng2011] SNAP [Tian2008] 
AdaptSA [Li2019]  Weighted means for dimensional representations of structure and attributes  Yes/No  Medium 
Accuracy
NMI Fmeasure Modularity Entropy 
Synthetic
WebKB Cora Political Blogs CiteSeer DBLP10K 
CODICIL [Ruan2013]
SACluster [Zhou2006] IncCluster [Zhou2010] PPSBDC [Chai2013] PCLDC [Yang2009] BAGC [Xu2012] 
kNAS [Boobalan2016]  with added Semantic Similarity Score  Yes/Yes  Medium 
Density
Tanimoto Coefficient 
DBLP*
Facebook* Twitter* 
SAClusterOpt [Cheng2011]
CODICIL [Ruan2013] NISE [Whang2016] 
6.1. Methods modifying Louvain, Normalised Cut, means, medoids and algorithms
The list of the methods is given in Table 10. According to Figure 3, SAC1 is one of the most influential methods for community detection in attributed social networks. Besides SAC1, we will also provide short descriptions of several other interesting methods.
6.1.1. Sac1
The method SAC1 [DangViennet2012] is based on the modification of Newman’s Modularity [Clauset2004] for a given partition of into clusters:
where the normalised link strength between nodes and is measured by comparing the existing network connection with the expected number of connections ( is the degree of ). To deal with the attributes, SAC1 uses the attribute modularity of a partition:
where is an attribute similarity function. As for the abovementioned SAC2 [DangViennet2012], if attributes are discrete, (5.2) is used for , while if they are continuous, (5.5) with is applied. Textual ones are first transformed into numeric values by TFIDF procedure. The similarity between the resulting representations is (5.3) or (5.4).
Next, a composite modularity is introduced as a weighted combination of structure modularity and attribute modularity
where is a fusion parameter. This function is then maximised in a way similar to that in Louvain [Blondel2008].
6.1.2. ILouvain
The method ILouvain [Combe2015] (source code and datasets) is based on a local optimization of a global criterion that includes Modularity [Newman2006] and a new measure called Inertia. The measure is defined by the sum of euclidean distances between attribute vectors and its centre of gravity, an average attribute vector over attribute vectors in the network. Using this notion, the authors define Inertiabased modularity for a partition that allows to compare, for each pair of elements from the same community, the expected distance with the observed distance between attributes. While considers the strength of the link between nodes in order to cluster strongly connected nodes, aims at clustering nodes whose attributes are the most similar. Like in [DangViennet2012], the community detection process consists in the optimisation of a linear combination of and similar to Louvain’s.
6.1.3. Daegc
Deep Attentional Embedded Graph Clustering DAEGC [Wang2019]
, in opposite to the embeddingsbased early fusion methods, uses a goaldirected deep learning approach with a unified framework for producing embeddings and clustering. Namely,
DAEGC fuses network topology and semantics via an attentional autoencoder (a variant of the graph attention network [Velickovic2018] taking into account highorder proximity) to obtain node embeddings. Furthermore, basing on the embeddings, soft labels are generated to guide a selftraining graph clustering component. These two procedures are joint and performed iteratively to benefit both embedding and clusterisation quality. DAEGC produces nonoverlapping clusters where is an input.6.1.4. kNAS
The method kNAS [Boobalan2016] starts with identification of centroids for clusters ( is an input) as nodes with high Local Outlier Factor [Breunig2000] meaning that the node is core (low Local Outlier Factor refers to outliers). Initial clusters are formed by the NN algorithm (i.e. topological similarity is achieved). Furthermore, the socalled Similarity Score responsible for nodes’ semantic similarity within clusters is measured. Taking into account the Similarity Score obtained, the clusters are merged and centroids updated in a certain way. The process of achieving topological and semantic similarity repeats until the Similarity Score is maximized.
6.2. Metaheuristicbased methods
These methods adapt metaheuristic algorithms (in particular, evolutionary algorithms and tabu search) for optimisation of an objective function that quantifies the structural closeness and attribute homogeneity of an attributed network partition. The list of the methods is given in Table
11.Algorithm  Modified method  Number of clusters as input/ Clusters overlap  Network size  Evaluation  Databases  Other attributed network clusterisation methods compared with 
MOEASA [Li20172]  Multiobjective evolutionary algorithm (Modularity and Attribute Similarity are maximized)  No/No 
Small
Medium 
Density
Entropy 
Political Books
Political Blogs Facebook100 egoFacebook 
SAC1SAC2 [DangViennet2012]
SACluster [Zhou2009] 
MOGA@Net [Pizzuti2019]  Multiobjective genetic algorithm (optimizing Modularity, Community score, Conductance, attribute similarity)  No/No 
Small
Medium 
NMI
Cumulative NMI Density Entropy 
Synthetic
Cora Citeseer Political books Political Blogs egoFacebook 
SAcluster [Zhou2009], BAGC [Xu2012]
OCru [Cruz2011Entropy] Selection [Elhadi2013] HGPACSPA [Elhadi2013, Strehl2003] 
JCDC [Zhang2016]  Tabu search and gradient ascent for a structureattributeaware loss function  Yes/No 
Small
Medium 
NMI 
Synthetic
World trade network [Nooy2004] Lazega 
CASC [Binkiewicz2017]
CESNA [Yang2013] BAGS [Xu2012] 
6.3. Nonnegative matrix factorisation and matrix compression
Nonnegative matrix factorization (NMF) is a family of algorithms that aim to approximate a nonnegative matrix with high rank by a product of nonnegative matrices with lower ranks so that the approximation error by means of the Frobenius norm, denoted below by , is minimal. As well known, NMF has an inherent clustering property, i.e. is able to find clusters in the input data [Lee2001]. The approximating product of matrices usually contains two factors but some algorithms [Ding2006] propose to include three or more. Often NMF is regularised (e.g. by a Lasso type conditions) to avoid bad behaviour of the approximating matrices.
As for nodeattributed social networks, NMF requires a proper adaptation to fuse both topology and semantics and this has been done in several papers, see Table 12. To proceed, we need additional notation. In what follows, denotes the adjacency matrix for the initial network topology (as before, is the number of nodes), the node attribute matrix for the initial network semantics ( is the dimension of attribute vector ), the number of required clusters (it is an input in NMF approaches), the cluster membership matrix whose elements indicate the association of nodes with communities and finally denotes the cluster membership matrix whose elements indicate the association of the attributes with the communities. Other auxiliary matrices will be introduced below.
The general idea of NMF methods for attributed networks is to use known matrices , and the number of clusters in order to determine the unknown matrices and in an iterative optimisation procedure, and thus to obtain simultaneously a community partition and the corresponding semantic description for each community. Note that each element of normalised and in fact contains the probability of a node to belong to a particular community (communities may overlap in these settings). One can instead assign a node to the community with the highest probability to obtain nonoverlapping communities [Wang2016].
“Matrix compression” technique will be discussed below while describing PICS algorithm [Akoglu2012].
Note that SCI [Wang2016] is one of the most influential methods for community detection in attributed social networks according to Picture 3. We will give a short description of SCI and several other NMFbased methods below.
Algorithm  Factorisation/ compression type  Number of clusters as input / Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
NPei [Pei2015]  3factor NMF  Yes/Yes 
Small
Medium 
Purity 
Twitter
DBLP* 
Relational Topic Model [Chang2009] (for documents) 
3NCD [Nguyen2015]  factor NMF  Yes/Yes 
Medium
Large 
F1score
Jaccard similarity 
egoFacebook
egoTwitter egoG+ 
CESNA [Yang2013] 
SCI [Wang2016]  2factor NMF  Yes/Yes  Medium 
ACC
NMI GNMI measure Jaccard similarity 
Citeseer
Cora WebKB LastFM 
PCLDC [Yang2009]
CESNA [Yang2013] DCM [Pool2014] 
JWNMF [Huang2015]  2factor NMF  Yes/Yes 
Small
Medium 
Modularity
Entropy NMI 
Amazon Fail dataset
Disney dataset Enron dataset DBLP4AREA dataset WebKB Citeseer Cora 
BAGC [Xu2012]
PICS [Akoglu2012] SANS [Parimala2015] 
SCD [Li2017]  2 and 3factor NMF  Yes/YesNo 
Small
Medium 
Accuracy
NMI 
Twitter
WebKB 
SCI [Wang2016] 
ASCD [Qin2018]  2factor NMF  Yes/YesNo 
Small
Medium 
ACC
NMI measure Jaccard similarity 
LastFM
WebKB Cora Citeseer egoTwitter* egoFacebook* 
BlockLDA [Balasubramanyan2011]
PCLDC [Yang2009] SCI [Wang2016] CESNA [Yang2013] Circles [Mcauley2014] 
CFOND [Guo2019]   and factor NMF  Yes/(Yes/No)  Medium 
Accuracy
NMI 
Cora
CiteSeer PubMed Attack Synthetic 
GNMF [Cai2008]
DRCC [Gu2009] LPNMTF [WangNie2011] iTopicModel [Sun2009] 
MVCNMF [He2017]  factor NMF  Yes/Yes 
Small
Medium 
Density
Entropy 
Political
Blogs
CiteSeer Cora WebKB ICDM (DBLP*) 
FCAN [Hu2016]
SACTL [Xu2016] kNAS [Boobalan2016] 
PICS [Akoglu2012]  Matrix compression (finding rectangular blocks)  No/No 
Small
Medium 
Anecdotal and visual study 
Youtube [Mislove2007]
Twitter* Phonecall [Eagle2009] Device [Eagle2009] Political Books (link) Political Blogs 
— 
6.3.1. NPei
The method NPei [Pei2015] uses a constrained nonnegative matrix trifactorization framework [Ding2006] to cluster Twitter users and messages by fusing the relations between users (i.e. topology) and content (i.e. semantics). The initial point is a userwordtweet tripartite network represented by several adjacency matrices similar to and . The optimisation problem proposed by the authors includes however not only useruser, userword and wordtweet adjacency matrices but also three types of network regularization [Smola2003] to model user similarity, message similarity and user interaction. The similarities are measured by a version of PageRank [Page1999] based on the cosine similarity of messages and the adjacency matrix for users. The optimisation is further performed by an iterative update algorithm [Ding2006] to obtain user cluster matrix and message cluster matrix . According to the authors, the complexity of their approach is with respect to the number of nodes in the network.
6.3.2. Sci
Semantic Community Identification SCI [Wang2016] adopts NMF for fusing topology and semantics as follows. The consistency in topology is modelled as , while the consistency in semantics as . The authors also propose to select the most relevant attributes for each community by adding an norm sparsity term to each column of matrix . This together with the models for topology and semantics leads to the following unified optimisation problem:
where controls the topology impact and the sparsity penalty. Within SCI, a local minima is found by MajorizationMinimization framework [Hunter2004]. In particular, the algorithm iteratively updates with fixed and then with fixed so that the process is guaranteed to converge. Note that, instead of using directly, the authors consider as the final community membership matrix.
6.3.3. Jwnmf
Joint Weighted Nonnegative Matrix Factorization method for clustering attributed graphs JWNMF [Huang2015] follows the same way to model topology as in [Wang2016] but with a weighted factorization for semantics, where the weights are automatically determined and updated to reduce the influence of uninformative attributes. Namely, a normalised diagonal matrix is introduced to assign a weight for each attribute and to be further used in the approximation by means of the norm, inspired by SymNMF [Kuang2015]. The corresponding optimisation problem thus takes the form:
where is the fusion parameter. The optimisation is performed iteratively [Ding2006]. Finally, a means variant is performed on to identify clusters. The complexity of JWNMF is .
6.3.4. Scd
The Semantic Community Detection method SCD [Li2017] introduces an additional community relationship indicator matrix whose elements describe the relationships between the corresponding communities, and set regularisation condition on it that aim to ensure the consistency of the community structure with respect to topology and semantics. The optimisation problem obtained is
where are the fusion parameters. The problem is further solved iteratively [Ding2006].
6.3.5. Ascd
Adaptive Semantic Community Detection ASCD [Qin2018] follows the general line of NMF modelling discussed above but additionally employs an adaptive parameter to control the mismatch between topology and semantics components. According to the authors, the mismatch, i.e. the effect occurring when topology is not compatible with semantics, may happen for some networks and negatively affect the clustering performance (several their experiments confirm it). For this reason, they deal with the following optimisation problem
where indicates the iteration number and is the matching coefficient that controls the tradeoff between topology and semantics according to the mismatch degree. There are two versions of , namely, one is based on functions and another on the NMI between the network topology and semantics. In particular, the former matching coefficient is defined as
where is a parameter. The optimisation problem is solved by the twostep block coordinate descent ( is updated while is fixed, then vice versa).
Remark 7.
In [Ito2018], NMFbased community detection in multilayer attributed networks is considered. Let us also mention the method from [Maekawa2018] that captures the complicated relationship between topology and semantics using a nonlinear projection function between the different cluster assignments for topology and semantics and adopts the positive unlabelled learning [Liu2003] to take the effect of partially observed positive edges into the cluster assignment.
6.3.6. Pics
The method PICS [Akoglu2012] (source) is a parameterfree algorithm that not only finds clusters but also detects anomalies and bridges. It is worth mentioning however that the nodes in a cluster found by PICS may be not necessarily densely connected due to the definition of clusters in [Akoglu2012]. As for the community detection process, PICS simultaneously “compresses” the network adjacency matrix and the binary attribute matrix by finding homogeneous rectangular blocks (considered further as clusters) of low and high densities in the matrices. The MDL principle [Grunwald2007], a criterion based on lossless compression principles, is adapted for this procedure.
6.4. Pattern miningbased (simultaneous fusion) methods
Pattern mining in attributed social networks focuses on fining and extraction of patterns, e.g. subsets of specific attributes or connections, in network topology and semantics [Atzmueller2019]. This in turn helps to make sense of a network and to understand why the corresponding connections could be formed. Pattern mining methods for community detection typically use local patterns and optimisation criteria for finding informative communities not in the whole network but in its part only (e.g. [Pool2014, Atzmueller2016]). Note that there are many papers devoted to pattern and semantic subgraph mining in social networks (see the survey in [Atzmueller2019]) but the majority of them do not deal with the task of community detection.
Community detection may be also based on cliques, according to a natural assumption that a community is a subset of wellconnected nodes [Bothorel2015, Khediri2017]. Recall that in graph theory, a clique is a subset of nodes in an undirected graph such that every two nodes are adjacent, i.e. the corresponding subgraph is complete. A clique is called maximal if there is no other clique that contains it.
The list of the corresponding method is presented in Table 13.
Algorithm  Patterns/Cliques  Number of clusters as input/Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
DCM [Pool2014]  Semantic patterns (queries)  Yes/Yes 
Small
Medium 
Evaluation  Delicious
LastFM Flickr 
— 
COMODO [Atzmueller2016]  Semantic patterns  Yes/Yes 
Small
Medium 
Description complexity
Community size 
BibSonomy [Benz2010]
Delicious LastFM 
DCM [Pool2014] 
ACDC [Khediri2017]  Maximal cliques  Yes/Yes  Medium  Density  Political Blogs 
SACluster [Zhou2009]
SAC1SAC2 [DangViennet2012] 
Note that DCM [Pool2014] is a rather influential method according to Picture 3 and therefore we provide its main ideas below.
6.4.1. Dcm
DescriptionDriven Community Detection DCM [Pool2014] (code source) searchers for patterns in binary attributes to form overlapping communities together with their proper descriptions. More precisely, each iteration of the algorithm consists of two steps aiming at reshaping the community via optimising first a certain structural quality function (the socalled community score based on local topology) and secondly a description complexity function that is based on mined concise patterns in attributes best describing the community (the patterns are called queries). Mining patterns is based on the ReMine algorithm [Zimmermann2010] that recursively splits the data into the most informative patterns. The authors underline that DCM is able to grow communities starting from small seeds of nodes or from preliminary descriptions (depending on what information is available at the beginning). At the same time, DCM is not initially created for the complete coverage of the network.
Remark 8.
In [Berlingerio2013], ABACUS (frequent pAttern miningBAsed Community discoverer in mUltidimensional networkS) is proposed to extract communities based on the extraction of patterns from multilayer attributed social networks.
6.5. Probabilistic modelbased methods
Algorithm  Model features  Number of clusters as input/Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
PCLDC [Yang2009] 
Conditional Link Model
Discriminative Content model 
Yes/No  Medium 
NMI
Pairwise measure Modularity Normalized cut 
Cora
Siteseer 
PHITSPLSA [Cohn2001]
LDALinkWord [Erosheva2004] LinkContentFactorization [Zhu2007] 
CohsMix [Zanghi2010]  MixNet model [Snijders1997]  Yes/No  Small  Rand Index 
Synthetic
Exalead.com search engine dataset 
Multiple view learning [Zhang2006]
Hidden Markov Random Field [Ambroise1997] 
BAGC [Xu2012]
GBAGC [Xu2014] 
a Bayesian treatment on distribution parameters  Yes/No  Medium 
Modularity
Entropy 
Political Blogs
DBLP10K DBLP84K 
IncCluster [Zhou2010]
PICS [Akoglu2012] 
VEMBAGC [Cao2014]  Based on BAGC [Xu2012]  Yes/No  Medium 
Modularity
Entropy 
Political Blogs
Synthetic networks 
BAGC [Xu2012] 
PPSBDC [Chai2013]  Popularityproductivity stochastic block model and discriminative content model  Yes/No  Medium 
normalized mutual information (NMI)
Pairwise F measure (PWF) Accuracy 
Cora
CiteSeer WebKB 
PCLDC [Yang2009]
PPLDC [Yang2010] 
CESNA [Yang2013]  A probabilistic generative model assuming communities generate network structure and attributes  No/Yes 
Medium
Large 
Evaluation 
egoFacebook
egoG+ egoTwitter Wikipedia* (philosophers) Flickr 
CODICIL [Ruan2013]
Circles [Mcauley2014] BlockLDA [Balasubramanyan2011] 
Circles [Mcauley2014]  A generative model for friendships in social circles  Yes/Yes 
Medium
Large 
Balanced Error Rate 
egoFacebook
egoG+ egoTwitter 
BlockLDA [Balasubramanyan2011]
Adapted LowRank Embedding [Yoshida2010] 
SI [Newman2015]  A modified version of a stochastic block model [Holland1983]  Yes/No 
Small
Medium 
Normalized mutual information (NMI) 
Synthetic
High school friendship network Food web of marine species in the Weddell Sea Harvard Facebook friendship network malaria HVR 5 and 6 gene recombination network 
— 
NEMBP [HeFeng2017]  A generative model with learning method using a nested EM algorithm with belief propagation  Yes/(Yes/No) 
Small
Medium 
Accuracy
NMI GNMI Fscore Jaccard 
WebKB
egoTwitter* egoFacebook* CiteSeer Cora Wikipedia* Pubmed 
BlockLDA [Balasubramanyan2011]
PCLDC [Yang2009] CESNA [Yang2013] DCM [Pool2014] SCI [Wang2016] 
NBAGCFABAGC [Xu2017]  A nonparametric and asymptotic Bayesian model selection method based on BAGC [Xu2012]  No/No  Medium 
NMI
Modularity Entropy 
Synthetic
Political Blogs DBLP10K DBLP84K 
PICS [Akoglu2012] 
Methods from this class statistically infer a model of a clustered attributed network under the assumption that its structure and attributes are generated according to certain parametric distribution. The generative or stochastic block model are mainly used [Alinezhad2019]. Note that it is a nontrivial task to properly choose a priori distributions for topology and semantics [Akbas2017].
According to [Yang2009], there are many probabilistic models combining both topology and semantics: PHITSPLSA combines PHITS with PLSA for community detection [Cohn2001]), [Erosheva2004] combines LDA with LDALink for network analysis to have the LDALinkWord model, [Nallapati2008] combine the mixed membership stochastic block model with LDA, and extend the LDALinkWord model by separating the citing documents and cited documents with LDALinkWord model on the citing documents and PLSA model on the cited documents. However, the majority of the methods appeared before [Yang2009] focused on document clustering which is generally out of scope of the present survey. For this reason we consider only community detection methods published after the seminal paper [Yang2009], see Table 14. We will also describe PCLDC [Yang2009], BAGC [Xu2012], GBAGC [Xu2014], CESNA [Yang2013] and Circles [Mcauley2014] as the most influential methods for community detection in attributed social networks according to Picture 3.
6.5.1. PclDc
The method PCLDC (Popularitybased Conditional Link ModelDiscriminative Content) [Yang2009] is based on a discriminative model of combining topology and semantics for community detection. It adapts a conditional model for network structure analysis taking into account the popularity of the nodes. The impact of irrelevant attributes is reduced by the usage of a discriminative content model where attributes are automatically assigned with proper weights, depending on their discriminative power. The abovementioned models are further combined in a unified framework with the maximum likelihood inference performed in a twostage EMbased optimization algorithm.
6.5.2. BagcGbagc
The method BAGC (Bayesian Attributed Graph Clustering) [Xu2012] employs a Bayesian probabilistic model for detecting nonoverlapping communities in networks with categorical attributes. BAGC uses a generative process similar to that in CohsMix [Zanghi2010], in particular, community labels for the nodes are modelled via a multinomial distribution independently, then attributes are modelled by a multinomial distribution and edges by a Bernoulli one basing on the labels modelled earlier. However, oppositely to CohsMix [Zanghi2010], BAGC works with categorical attributes and does not treat the parameters of distributions as fixed values. More precisely, BAGC takes a Bayesian treatment on the parameters and thus considers all their possible values that leads, according to the authors, to better community detection quality. The probabilistic inference is further performed by the variational approach from [Jordan1999] together with a certain approximating procedure. GBAGC (General Bayesian framework to Attributed Graph Clustering) [Xu2014], a generalisation of BAGC for weighted attribute networks, is further proposed by the same authors.
6.5.3. Cesna
The method CESNA (Communities from Edge Structure and Node Attributes) [Yang2013] simultaneously uses the probabilistic generative model of BIGCLAM [Yang2013bigclam] for generating connections and the logistic model for attributes to infer the distribution of community memberships. The resulting communities are overlapping. Furthermore, a blockcoordinate ascent method is used to update all model parameters in time, where , that makes CESNA robust for large attributed networks.
6.5.4. Circles
The method Circles [Mcauley2014] detects users’ social circles in attributed user’s ego networks via a multimembership node clustering. Its generative model is based on hard assignment of a node to multiple circles and learns the circlespecific user profile similarity metric. To maximize the corresponding likelihood, the coordinate ascent by [MacKay2002] is used.
Remark 9.
TUCM (Topic User Community Model) [Sachan2012] proposer generative Bayesian models for detecting overlapping communities in multilayer attributed networks where different types of interactions between users are possible.
6.6. Dynamical systembased and agentbased methods
Methods from this class treat a network as a dynamic system and assume that its community structure is a consequence of certain interactions among nodes, see Table 15. Some methods assume that the interactions occur in an information propagation process, i.e. while information is sent to or received from every node. Others comprehend each node as an autonomous agent and develop a multiagent system to detect communities. In fact, these methods are not among the most influential in Picture 3 but this is probably due to their novelty. In any case, these contemporary approaches seem to be very efficient for large attributed social networks as can be easily parallelised.
Algorithm  Description  Number of clusters as input / Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
CPIPCPRW [Liu2015]  Content (information) propagation models: a linear approximate model of influence propagation (CPIP) and content propagation with the random walk principle (CPRW)  Yes/Yes  Medium  Fscore, Jaccard Similarity, Normalized Mutual Information (NMI)  CiteSeer
Cora egoFacebook PubMed Diabetes 
Adamic Adar [Adamic2003]
PCLDC [Yang2009] Circles [Leskovec2012] CODICIL [Ruan2013] CESNA [Yang2013] 
CAMAS [Bu2017]  Each node with attributes as an autonomous agent with influence in a clusteraware multiagent system  No/Yes 
Medium
Large 
Coverage Rate
Normalized Tightness Normalized Homogeneity F1Score Jaccard Adjusted Rand Index 
Synthetic
egoFacebook egoTwitter* egoG+ 
CESNA [Yang2013]
EDCAR [GunnemannBoden2013] 
SLA [Bu2019]  A dynamic cluster formation game played by all nodes and clusters in a discretetime dynamical system  Yes/No 
Medium
Large 
Density
Entropy F1score 
Delicious
LastFM egoFacebook egoTwitter* egoG+ 
CESNA [Yang2013]
EDCAR [GunnemannBoden2013] 
7. Late fusion methods
Late fusion methods intend to fuse topology and semantics after the clusterisation step. Usually clusterings produced separately for topological (e.g. by the Louvain method [Blondel2008]) and semantic (e.g. by means [Hartigan1979]) information are further fused via consensus (ensemblebased) clustering techniques [Lancichinetti2012, Strehl2003, Tagarelli2017, Tandon2019, Gullo2013].
7.1. Consensusbased methods
Given an ensemble of clusterings, the goal is to perform a consensus clustering, i.e., a single, prototypical clustering solution that optimizes a certain objective function properly defined over information available from the clusterings in the ensemble. A recent survey on generalpurpose ensemblebased clustering methods can be found in [Boongoen2018]. Besides the abovementioned generalpurpose approaches that actually have not been compared with the methods discussed in this survey, we could only find the ones in Table 16 that particularly focus on community detection in attributed social networks. Definitely, further study in this direction and comparison with other attributed network clustering methods is necessary.
According to Picture 3, there are no consensusbased methods among the most influential for community detection in attributed social networks but we will nevertheless provide some details on some of them, namely, Selection [Elhadi2013] and WCMFA [Luo2019].
7.1.1. Selection
The Selection method [Elhadi2013] switches from topologybased to semanticsbased clustering when the graph structure is ambiguous. More precisely, the method relies on topologybased clusters when the socalled estimated mixing parameter
for the topologybased clustering is less then the experimental value of the mixing parameter in LFR benchmark with ground truth [Lancichinetti2008] when the NMI corresponding to the topologybased method significantly drops (the graph structure is then called ambiguous). For instance, for the Louvain method as shown in [Lancichinetti2008, Elhadi2013]. If , then the semanticsbased clustering (obtained e.g. by means) is used. The performance of Selection is particularly compared with that of HGPA (HyperGraph Partitioning Algorithm) and CSPA (Clusterbased Similarity Partitioning Algorithm), generalpurposed ensemble clustering methods from [Strehl2003], in combining the topologybased Louvain clustering and the semanticsbased means clustering of the network. It is observed that Selection is able to outperform (in some sense) the tested methods by switching from the Louvain clustering to the means one.
7.1.2. Wcmfa
The Weighted Coassociation Matrixbased Fusion Algorithm WCMFA [Luo2019] takes as an input an ensemble of several clusterings based separately on topology and semantics with weights depending on topological and semantic similarity of the initial nodes. Furthermore, a weighted coassociation matrix is constructed so that the cooccurrence of two nodes in the same cluster and the degree of its similarity, if the pair is indeed in the same cluster, is taken into account. The matrix is then treated as a similarity matrix for the node set that can be input for Single Link, Complete Link or Average Link clustering algorithms to find a consensus community structure.
Remark 10.
We refer the interested reader to [Tagarelli2017, Tandon2019, Gullo2013] for generalpurpose clustering methods for multilayer networks.
Algorithm  Combining the partitions  Number of clusters as input / Clusters overlap  Network size  Evaluation  Databases  Community detection methods for attributed graphs compared with 
LCru [Cruz2013]  Rowmanipulation in the contingency matrix for the clusterings  No/No 
Small
Medium 
ARI
Density Entropy 
Facebook
DBLP10K 
— 
Selection [Elhadi2013]  Switching between the clusterings  Depends on the partitions  Medium 
NMI
Modularity 
Synthetic LFR benchmark [Lancichinetti2008]
DBLP84K 
BAGC [Xu2012]
OCru [Cruz2011] SACluster [Zhou2009] HGPACSPA [Strehl2003] 
Multiplex[HuangWangg2016]  Multiplex representation scheme (attributes and structure are clustered separately as layers and then combined via consensus [Tepper2015])  No/Yes 
Medium
Large 
F1score 
Synthetic
egoTwitter egoFacebook egoG+ 
CESNA [Yang2013]
3NCD [Nguyen2015] 
WCMFA [Luo2019]  Association matrix with weighting based on topology and semantics similarity  Depends on the partitions  Small 
Rand index
Adjusted RI NMI 
Consult [Cross2004]
London Gang [Grund2015] Montreal Gang [Descormiers2011] 
WMen [Meng2018] 
8. Conclusion
It is shown in the survey that there exist a large amount of methods for community detection in nodeattribute social networks based on different fusion techniques. In particular, 77 methods are grouped and analysed using the proposed classification criterion and much more are mentioned as relative to the topic under consideration. Moreover, we indicated the most influential methods and gave their short descriptions.
According to our analysis, several essential problems exist in the area. For example, an comprehensive comparative study is an emergency problem as the existing partial contributions to this do not allow to see the overall picture of methods’ community detection quality. Figure 2 confirms the fact that the methodmethod comparison graph is very sparse. What is more, even if some methods are compared with others, it does not generally mean that one purely
outperforms another as hyperparameters of the methods under consideration are usually not tuned for a particular task. Moreover, different authors use for experiments different datasets and quality metrics and this does not add clarity to the question. Another issue is that many authors are just unaware of the stateoftheart methods and continue to compare their approaches with rather inefficient pioneering methods.
We hope that our survey is an important step in resolving the abovementioned methodological and experimental problems.
9. Competing Interests Statement
There are no competing interests in publication of this survey paper.
10. Acknowledgements
This research is financially supported by Russian Science Foundation, Agreement 177130029 with cofinancing of Bank Saint Petersburg.
Comments
There are no comments yet.