Mesoscale properties of graphs are often used to capture the structure of complex networks. A prevalent mesoscale feature in real-world networks is the core-periphery structure [Borgatti2000Models]. Core-periphery property is ubiquitous in social networks [barbera2015the], trade and transport networks [Verma2016Emergence], citation networks, and communication networks [Alvarez2005K]. They are useful in analyzing biological networks like brain networks [bassett2013task, Harlalka2019Atypical], genome-scale metabolic networks of organisms[Da2008Centrality], and network of protein–protein interactions[Jeong2001Lethality], to name a few. For instance, in brain networks, the core-periphery structure explains cognitive learning processes [bassett2013task] and in social networks [barbera2015the] (contact networks [Kitsak2010Identification]), the most influential spreaders of information (respectively, disease) are observed to be in the core part of the network. Therefore, identifying the core and peripheral vertices helps in analyzing the central processes in complex networks.
A core-periphery structure in graphs refers to the presence of densely connected groups of core vertices and sparsely connected periphery vertices. Core vertices are those vertices that have cohesive connections among them. Peripheral vertices, on the other hand, are not well connected to each other but are relatively well connected to core vertices. An example graph with a core-periphery structure is shown in Fig. 1(a). The dark nodes in the figure with sparse connections are peripheral nodes, while the lighter ones with dense connections are core nodes. Fig. 1(b) shows the adjacency matrix corresponding to the graph with vertices ordered in the descending order of coreness, where the lower-right block of the matrix corresponds to periphery-periphery connections and can be observed to be very sparse compared to the upper-left block corresponding to core-core connections.
Existing algorithms estimate core scores of the nodes of a graph given the network topology [Borgatti2000Models, Rombach2014Core, Holme2005Core, Della2013Profiling, Jia2019Random, Zhang2015Identification], but ignore node attributes that might also have information about the coreness of nodes. In many applications, we have access only to attributes of entities, and the underlying graph structure may not always be available. For example, in brain network analysis, we may have functional magnetic resonance imaging (fMRI) data of different subjects without information about the underlying structural connectivity. Therefore, in this work, we develop an approach that learns a core-periphery structured graph from node attributes alone so that the coreness of nodes are revealed implicitly. Conventional approaches to network topology inference [dong19LearnGraphData, Sravanthi2020Learning, product2020Sai] are based on graphical lasso, which assumes a Gaussian Markov random field model for data and estimates a conditional independence graph determined by the estimated sparse inverse covariance matrix [Friedman2008Sparse]. However, the sparsity pattern recovered from graphical lasso does not readily incorporate a core-periphery structure in networks. To incorporate such a core-periphery structure, it is important to model the generative process of the edges in a graph through core scores of the vertices of the graph. See an illustration of networks estimated using the proposed method (described later on) and graphical lasso in Fig. 2(a) and Fig. 2(b), respectively. It can be observed that graphical lasso does not capture the core-periphery structure.
In this work, we propose a generative data model to relate the node attributes to the graph as well as to the nodal core scores. The proposed probabilistic generative model for node attributes models the dependence of the node attributes on the core scores through a latent graph structure. In particular, we model core scores as variables that influence the sparsity of the graph. Though often ignored in graph analysis tasks, spatial distances between nodes play a vital role in differentiating the core nodes from the peripheral ones [Jia2019Random]. For example, countries far apart in a world trade network are less likely to be connected. To this end, we also incorporate spatial information into our model. Using the proposed position-aware probabilistic model, which promotes a core-periphery network structure, we jointly estimate a sparse graph and core scores of every node in the graph. Specifically, the proposed estimator jointly learns a sparse graph structure and node core score assignments that induce dense (sparse) connections in core (respectively, peripheral) parts of the network while accounting for the spatial distances between the nodes whenever available. We evaluate the proposed method through a number of numerical experiments on real-world data from various domains like brain, social, and transportation networks. We verify the correctness of the core scores learnt using the proposed method by comparing them with existing core score estimation algorithms, which use only the underlying known graph. The results indicate that the proposed method estimates core scores of the vertices from node attributes alone and are on par with existing methods. We also apply our method on fMRI data and report interesting observations about the differences in interactions between the brain regions in healthy subjects and individuals with attention deficit hyperactivity disorder (ADHD).
2 Background: Gaussian graphical model
Consider a weighted and undirected graph , where is the vertex set with vertices and is the edge set. Let us collect the vertex attributes in the feature matrix , where the th row of contains features of the entity associated to the th vertex of .
In a Gaussian graphical model, are modeled as independent and identical observations drawn from , where and is a positive definite matrix. The sparsity structure of the precision matrix encodes all the conditional dependencies between the variables associated with the vertices of . Specifically, any th entry of being zero implies conditional independence of variables associated with vertices and , given the rest, and that there is no edge between the two vertices. Graphical lasso learns the sparsity pattern in by solving an -regularized Gaussian maximum log-likelihood problem [Friedman2008Sparse]:
where is the empirical covariance matrix and is a regularization parameter that controls the sparsity in . Although graphical lasso recovers sparse graphical models, it does not readily incorporate any specific sparsity structure such as the core-periphery structure of interest as the -penalty is uniformly applied on all the edges.
3 Model description
In this section, we propose a prior that induces a sparsity pattern in graphs determined by the core scores of its vertices and the spatial distances between the vertices. Then using this probabilistic model, we propose an estimator for learning sparse Gaussian graphical models with a core-periphery structure.
denote a vector containing the core scores withdenoting the core strength of vertex . In other words, the likelihood of vertex belonging to the core part of the network increases with the value of . Also, let denote the spatial distance between vertices and . We now propose a probabilistic generative model that relates the node attributes in to its core scores through .
We model the node attributes based on a Gaussian graphical model. That is, the conditional probability distribution ofgiven the precision matrix , is given by (up to constants)
In networks with a core-periphery structure, we have sparser connections between vertices in the periphery, relatively denser connections between the vertices in the core and periphery, and very dense connections between vertices in the core [cf. Fig. 1]. Further, vertices that are spatially well separated have sparser connections between them. To promote a sparsity pattern determined by the core-periphery structure in the graph, we therefore model the edges of the graph such that the value of is very small when is small, i.e., if vertices and both belong to the periphery or if they are spatially far apart. Here, parameter controls the dependence of on . The parameter is set to if the spatial information is not available or accounted for. To satisfy our above requirements, we model the generative process for each entry of using a Laplace distribution with inverse diversity parameterized by the latent variables and as
Specifically, the prior distribution of parameterized by is
where is the normalization constant and controls the overall impact of and on . For for, ,
to be a valid probability distribution, the inverse diversity parametershould be nonnegative, i.e., for , . Next, we aim to infer the model parameters and based on the observed node attributes .
4 The proposed learning algorithm
In this section, we present an algorithm to jointly learn the core score vector and a sparse graph represented by the zero pattern in . We estimate the model parameters and by maximizing the posterior distribution given data, i.e., by maximizing
with respect to the parameters and . The log-likelihood function is . The prior distribution is a weighted -penalty on with the weights determined by . Thus the proposed optimization problem for learning the model parameters is
The -penalty in the above optimization problem is relatively smaller if either of the vertices belongs to the core and is the smallest when they both belong to the core as explained by the data. As the value of is increased (while satisfying the inequality ), the fraction of edges between spatially distant vertices decreases. Further, to prevent the case where all the weights tend to zero, we constrain the sum of the core scores to a real-valued positive number , while fixes the scale of the core scores.
The problem in (5) is a non-convex optimization problem in the variables and . In what follows, we propose a solver based on block coordinate ascent to solve (5). This decomposes the above non-convex optimization problem into a set of convex subproblems. At each iteration of the algorithm, we update by fixing and then update while fixing .
4.1 Updating the graph
For a fixed , the problem in (5) simplifies to the following graphical lasso problem with a weighted -regularization
with known weights that depend on . This is a convex program that can be solved using existing solvers, e.g., QUIC [Hsieh2014QUIC].
4.2 Updating the vertex core scores
For a fixed , the problem in (5) simplifies to
which is a linear program that can be solved using standard off-the-shelf solvers. We can clearly see from the objective function that the core scores are influenced by the edge weights, which are in turn learnt from data. The subproblem in (7) can also be used to estimate core scores given a graph.
The proposed procedure of block coordinate ascent is initialized with an arbitrary (we use a scaled all-one vector in our experiments) and is repeated till convergence.
5 Numerical experiments
In this section, we evaluate the graph inference and core score learning capabilities of our framework on real-world datasets from biological, social, and transportation networks. We compare the core scores estimated from existing algorithms, namely, MINRES [Boyd2010Computing], Rombach [Rombach2014Core], RandomWalk [Della2013Profiling] and k-cores [Alvarez2005K]. MINRES learns the core scores such that the adjacency matrix is approximated by . Rombach is an extension of the continuous formulation of [Borgatti2000Models], which proposes an ideal block model for core-periphery structured networks and estimates core scores by comparing the given adjacency matrix with the ideal block model. RandomWalk estimates the core scores in a network by developing the behavior of a random walker. k-cores is a method for partitioning the nodes in a network recursively from the periphery to the more central ones. The input to these existing methods is the ground truth graph. In contrast, the input to our method is just the node attributes.
5.1 Model evaluation
We first apply our method on Celegans [Marcus2006Non]
, Cora[Sen2008Collective], London underground [Jia2019CP], Twitter [Greene2013Producing], and WebKB [Craven1998Learning] datasets.
Cora is a citation network dataset. In addition to the network of citations, it also contains a binary matrix of size , where is the number of papers in the dataset and is the size of the vocabulary. The th entry of indicates if the th word in the vocabulary is present in the th paper. The Twitter dataset consists of data related to 464 Twitter users, covering athletes and organizations involved in the London 2012 summer olympics. The node attribute matrix corresponds to the lists of the users, and the ground truth network is formed by the followers’ information, with an edge between two users if either of them follow the other. The WebKB [Craven1998Learning] dataset contains webpages collected from computer science departments of four universities. The node attribute matrix is a binary matrix indicating whether each word in the vocabulary is present or absent in the webpage. The vocabulary size is words. The spatial distance information for Cora, Twitter, and WebKB datasets is not available.
is the number of tube stations. For Cora and Twitter datasets, the hyperparameteris set to and to 0.09 for Celegans and London underground datasets. We fix to for all the datasets. Increasing increases the fraction of nodes with high core scores, whereas can be tuned according to the required percentage of edges in the network.
Once the core scores are learnt, we order the nodes of both the learnt and the ground truth networks in the decreasing order of core scores. We often observe a perfect core-periphery structure in the estimated network. Although the proposed method is agnostic of the ground truth network, we observe that the ground truth network, when ordered according to the core scores learnt using the node attributes of the network alone, reveals the core-periphery structure. As an example, the adjacency matrices of the learnt and the ground truth networks of Twitter dataset are shown in Fig. 3. The adjacency matrices computed using the proposed method and graphical lasso on a subset of WebKB data related to Texas University are shown in Fig. 2(a) and Fig. 2(b), respectively.
To compare the proposed method with existing works that estimate core scores given a network, we apply existing core score estimation algorithms on the ground truth networks of the considered datasets, whereas for the proposed method we compute core scores from node attributes. We then order the networks according to the core scores given by the respective algorithms. We compare the ordered networks with the ideal core-periphery block model [Borgatti2000Models], given by
where and are dimensional matrices with all ones and all zeros, respectively. We compute for different algorithms. In (8), fixes the proportion of core nodes in the considered network. We fix to for all the experiments in this section. Comparison of and for different algorithms including the proposed method are shown in Table 1. We observe that the values given by the proposed method are similar to those obtained from the other methods. This indicates that the core-periphery partitioning of the networks by the proposed method is similar to the others, in spite of not knowing the network directly. Furthermore, for all the datasets, indicating a more prominent core-periphery structure in the estimated graph than the ground truth, which is also evident from Fig. 3(a). This suggests that the graph estimated using the proposed method by itself can be used to differentiate the core nodes from the peripheral ones.
Finally, Fig. 3(c) shows the convergence plot for the Celegans dataset. We observe that the value of the objective function monotonically increases till convergence. The proposed algorithm converges in less than iterations for this dataset.
5.2 Brain network analysis
We next apply the proposed method to examine differences between the core and the peripheral regions of healthy individuals and subjects with ADHD. For this purpose, we use fMRI data from the OHSU brain institute [Richards2015NITRC]. The dataset consists of fMRI time series for the regions of interest in the cc200 parcellation for a total of 79 individuals, 42 of which correspond to healthy subjects and the others to subjects with ADHD.
We independently compute the core scores of different individuals from their fMRI data. We denote the average of the core score vectors of healthy subjects by and that of subjects with ADHD by . The magnitude of difference between the normalized average core score vectors of the two groups, , serves as a measure of the differences in coreness of different brain regions across the two groups. The highest difference in connectivity (denoting difference in interactions) in the estimated networks is observed in the following regions: paracentral lobule, inferior frontal gyrus, anterior singulate and insula. Fig. 4 shows 10 regions with the largest difference in connectivity, as measured by the 10 largest values of . The darker nodes in the figure denote the regions with a larger difference in the cores scores of the two groups. These identified regions, namely, paracentral lobule, inferior frontal gyrus, anterior cingulate and insula coincide with those reported in [Dickstein2014The] as the regions with differences in activation for healthy individuals and patients with ADHD.
We developed a generative model to relate node attributes to the core scores of vertices through a latent graph structure. Based on the proposed generative model, we presented a joint estimator to simultaneously infer the vertex core scores and a sparse graph whose sparsity pattern is determined by the core scores. The recovered graphs can be readily used to perform core-periphery detection. We presented a block coordinate ascent algorithm to solve the proposed estimation problem. We demonstrated via numerical experiments that the proposed method learns a core-periphery structured graph from only the node attributes while learning core scores on par with methods that use the ground truth network as input. We also applied our method to fMRI data to infer the regions that are the most affected in subjects with ADHD.