An Efficient Algorithm for Generating Directed Networks with Predetermined Assortativity Measures

01/10/2022
by   Tiandong Wang, et al.
0

Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by proposing a degree-preserving rewiring algorithm, called DiDPR, for generating directed networks with given directed assortativity coefficients. We construct the joint edge distribution of the target network by accounting for the four directed assortativity coefficients simultaneously, provided that they are attainable, and obtain the desired network by solving a convex optimization problem.Our algorithm also helps check the attainability of the given assortativity coefficients. We assess the performance of the proposed algorithm by simulation studies with focus on two different network models, namely Erdös–Rényi and preferential attachment random networks. We then apply the algorithm to a Facebook wall post network as a real data example. The codes for implementing our algorithm are publicly available in R package wdnet.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 16

page 21

page 22

page 25

01/13/2021

Assortativity measures for weighted and directed networks

Assortativity measures the tendency of a vertex in a network being conne...
06/18/2019

Rooting for phylogenetic networks

This paper studies the relationship between undirected (unrooted) and di...
02/05/2018

An efficient counting method for the colored triad census

The triad census is an important approach to understand local structure ...
05/22/2020

Target Location Problem for Multi-commodity Flow

Motivated by scheduling in Geo-distributed data analysis, we propose a t...
12/02/2019

Core-Periphery Structure in Directed Networks

While studies of meso-scale structures in networks often focus on commun...
04/06/2021

PageRank centrality and algorithms for weighted, directed networks with applications to World Input-Output Tables

PageRank (PR) is a fundamental tool for assessing the relative importanc...
05/20/2020

Edge removal in undirected networks

The edge-removal problem asks whether the removal of a λ-capacity edge f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Assortativity is a metric measuring the tendency that nodes in a network are connected to vertices with similar node-specific characteristics, such as node degrees and strengths (Newman, 2002; Yuan et al., 2021). Originally, Newman (2002) proposed an assortativity measure based on node degrees for unweighted, undirected networks. Analogous to Pearson’s correlation coefficient, assortativity coefficient ranges from to , with a positive (negative) value indicating that high degree nodes are likely to be connected with high (low) degree nodes. A network with positive (negative) assortativity coefficient is called assortative (disassortative) mixing.

For directed networks, Newman (2003) defined an assortativity measure based on out-degrees of source nodes and in-degrees of target nodes. In general, there are four types of assortativity measures for directed networks (Foster et al., 2010; Piraveenan et al., 2012), namely out-in, out-out, in-in and in-out assortativity coefficients. For instance, a large positive out-in assortativity coefficient suggests that source nodes with large out-degrees tend to link to target nodes with large in-degrees. Assortativity has been extended to weighted, undirected networks (Leung and Chau, 2007) and weighted, directed networks (Yuan et al., 2021). For other recent developments, see Chang et al. (2007); Holme and Zhao (2007); Litvak and van der Hofstad (2013); Noldus and van Mieghem (2015).

Generating random networks with given assortativity is of great theoretical and practical interest. On one hand, once a hypothesized model has been fitted to a given dataset, checking whether the fitted network has achieved the assortativity levels from the original dataset is an important way to assess the goodness of fit. On the other hand, if discrepancies on the assortativity coefficients have been observed, one can propose necessary improvements, e.g. edge rewiring, for the hypothesized model to better capture the underlying network dynamics. For undirected networks, such attempts have been made in Newman (2003), where a degree-preserving rewiring algorithm has been proposed. The crux of the algorithm is to construct a target network with the given assortativity coefficient, then rewire the initial network towards the target. Newman (2003) achieved this goal by characterizing the distribution for the number of edges connecting two nodes with certain degrees, but this method is unfortunately inapplicable to directed networks, especially when there are four assortativity coefficients to control simultaneously.

Recently, Bertotti and Modanese (2019) developed a rewiring method for obtaining the maximal and minimal assortativity coefficients in undirected networks. However, limited work has been done for directed networks. Kashyap and Ambika (2017) introduced a rewiring algorithm focusing on only one of the four assortativity coefficients in directed networks, but overlooked the other three. Uribe-Leon et al. (2021) proposed a three-swap method to investigate the profile of rank-based assortativity measures in directed networks. To the best of our knowledge, there is no research accounting for four types of assortativity coefficients and their attainability simultaneously in directed networks. Hence, one of the primary goals of the present paper is to fill this gap.

Here we propose a feasible, efficient rewiring algorithm for directed networks towards the four given assortativity levels simultaneously, provided that they are attainable. Our algorithm is both a complement and a generalization of the two-swap degree-preserving algorithm in Newman (2003), hereafter referred to as Newman’s algorithm. The incompatible component in Newman’s algorithm, i.e., the construction of joint edge distributions, is handled by formulating and solving a convex optimization problem, from which we further generalize and extend the algorithm to account for the four assortativity coefficients simultaneously. After a certain number of rewiring attempts, all assortativity coefficients in the resulting networks will attain their target values. In addition, since the four types of assortativity measures are dependent on each other, after fixing the value of one of the four assortativity measures, not all values in

can be reached for the other three. Our algorithm then provides legitimate bounds for the coefficients, which has not been considered in Newman’s algorithm. The implementation of the proposed algorithm is in an open-source

R package wdnet (Yan et al., 2021).

The rest of this paper is organized as follows. The proposed algorithm is presented with a discussion on legitimate bounds of the predetermined assortativity levels in Section 2. An extensive simulation study is reported in 3, where two widely used random network models, i.e. the Erdös-Rényi and the preferential attachment models, are considered. in Section 4, we also apply the proposed algorithm to a real dataset obtained from the Facebook wall post network. Important discussions, concluding remarks and extensions are then provided in Section 5.

2 Rewiring towards Given Assortativity Coefficients

We start with a review on directed assortativity coefficients in Section 2.1, after which we present our rewiring algorithm in Section 2.2. We also look into the bounds of directed assortativity coefficients in Section 2.3.

2.1 Directed Assortativity

Let be a network with node set and edge set . For any that are connected, we use to represent a directed edge from source node to target node . As our goal is to study directed networks, we simplify the notations by calling out- and in-degree type 1 and 2, respectively. We use and to denote the out- and in-degrees of node , respectively. Let be standard indicator function. Given a network , define the empirical out-degree and in-degree distributions respectively as

(1)

where denotes the cardinality of the node set . Let be the proportion of edges from a source node of type degree to a target node of type degree for . We use and to distinguish the marginal distributions for source and target nodes, respectively. For instance,

refers to the probability that an edge emanates from a source node of type

degree , whereas is the probability that an edge points to a target node of type degree . The four types of directed assortativity coefficients (Yuan et al., 2021) are given by

(2)

where

are standard deviations of

and , respectively.

Before presenting our algorithm, we need to define a few more notations. Let be the proportion of nodes with out-degree and in-degree . By (1), we have

Define also as the proportion of directed edges linking a source node with out-degree  and in-degree to a target node with out-degree and in-degree . Then the following relations hold, which are the building blocks for the development of our algorithm:

(3)
(4)

Additionally, we write and as functions of and :

Hence, by Equation (2), all assortativity coefficients, , , are functions of and . This is a crucial observation which helps develop our degree-preserving rewiring algorithm in Section 2.2.2.

2.2 Rewiring Algorithm for Directed Networks

We start with a succinct review of Newman’s algorithm (Newman, 2003) for undirected networks, followed by proposing our algorithm for directed networks. The proposed algorithm is a non-trivial extension of Newman’s algorithm.

2.2.1 Newman’s Algorithm for Undirected Networks

Let be the initial undirected network, and be the target assortativity coefficient. For undirected networks, if there is an edge connecting , we denote it by . In addition, we use to represent the proportion of edges connecting two nodes respectively with degree and . The pseudo codes of Newman’s algorithm are given in Algorithm 1, for which a sufficiently large is required to ensure convergence of the algorithm.

Input: Initial network , number of rewiring steps , target assortativity coefficient .
Output: .
1 Compute the empirical degree distribution from ;
2 Compute the size-biased distribution ;
3 Construct an appropriately-defined matrix such that the assortativity coefficient of the network associated with joint edge distribution is ;
4 while  do
5       Sample two edges at random;
6       Compute the degrees of the nodes at the ends of the two sampled edges:
7                 , , , ;
8       if  then
9            ;
10      else
11            ;
12      Draw ;
13       if  then
14            Remove and , append and ;
15      else
16            Keep and ;
17       ;
18      
Algorithm 1 Pseudo codes for Newman’s Algorithm.

A key step of the algorithm is to construct with the following constraints: is symmetric; both row and column sums are equal to ; ; and the resulting ’s are nonnegative. The method in Newman (2003), however, does not guarantee the existence of for any arbitrary , nor is it directly applicable to directed networks.

2.2.2 Directed Network Degree-Preserving Rewiring (DiDPR) Algorithm

Given an initial directed network and four target assortativity coefficients (of different types), we propose a rewiring algorithm such that the four directed assortativity measures of the resulting network will reach their corresponding targets simultaneously. The main idea of our algorithm is to characterize the network with the target assortativity coefficients through , which is obtained by solving a convex optimization problem. During the entire procedure, the out- and in-degree distributions of are preserved, and we refer to the proposed algorithm as a directed network degree-preserving rewiring (DiDPR) algorithm. We reuse to denote the number of iterative steps of rewiring. Similar to Newman’s algorithm, we need to be sufficiently large to ensure the convergence of the proposed algorithm.

Recall the definition of in Section 2.1. Here we consider as an extension of the joint edge distribution in Algorithm 1, and we will develop a reliable scheme to determine the four-dimensional matrix, , which will then be used to characterize networks with the given assortativity coefficients. According to the properties of in Equations (3) and (4), we notice that these linear relations are linear constraints in terms of , allowing us to convert the problem of finding to a convex programming problem (Boyd and Vandenberghe, 2004).

Suppose that , are the predetermined values of the four directed assortativity measures, and let be some convex function. According to the linear constraints on given in Equations (3) and (4), as well as the natural bounds of , we set up the following convex optimization problem to solve for appropriate , with the initial network :

s.t. (5)

where for , are functions of and by Equation (2). Since the proposed rewiring algorithm preserves out-degrees and in-degrees, the structure of remains unchanged, allowing us to calculate all the values of from the initial network . Specifically, we solve the convex optimization problem via the utility functions developed in R package CVXR (Fu et al., 2020), which is available on the CRAN. The CVXR package provides a user-friendly interface that allows users to formulate convex optimization problems in simple mathematical syntax, and utilizes some well developed algorithms, like the embedded conic solver (ECOS) (Domahidi et al., 2013), to solve the problems.

For the convex optimization problem defined above, the four-dimensional structure can be reduced to a matrix, where its elements are defined as non-negative variables, to fit the interface of the CVXR package. Details are presented in Appendix A. Without loss of generality, we set the convex objective function as to save computation powers.

Given the initial network and the solved . At each rewiring step, randomly select a pair of edges . Measure the out- and in degrees , , , , , , and . We then replace the selected and with and with probability

(6)

We continue the rewiring in an iterative manner for times, and then obtain the resulting network , which is the output of the DiDPR algorithm. The pseudo codes of the DiDPR algorithm are given in Algorithm 2.

This proposed algorithm has a few appealing properties. First, during the entire rewiring procedure, the out- and in-degrees of the nodes connected by the sampled edges remain unchanged regardless of the acceptance or rejection of the rewiring attempt, thus preserving the structure of . Next, the DiDPR algorithm is ergodic over the collection of networks with given out- and in-degree sequences (denoted by ), as any network in can be reached within a finite number of rewiring steps. Lastly, the proposed algorithm satisfies the detailed balance condition, i.e., for any two configurations , it follows from Equation (6) that

where denotes the probability of sampling configuration , and is the transition probability from to .

Input: Initial network , number of rewiring steps , target assortativity coefficients , .
Output: .
1 Apply the convex optimization algorithm to get ;
2 while  do
3       Sample two directed edges at random;
4       Compute the out- and in-degrees of the four nodes of the two sampled edges:
5                 , , , ,
6                 , , , ;
7       if  then
8            ;
9      else
10            ;
11      Draw ;
12       if  then
13            Remove and , append and ;
14      else
15            Keep and ;
16      ;
17      
Algorithm 2 Pseudo codes for the DiDPR algorithm.

2.3 Directed Assortativity Coefficient Bounds

The four target assortativity coefficients, , , are naturally bounded between and . However, based on the structure of , we cannot arbitrarily set four targets, , , to any values within the range of while preserving , since certain combinations of the target values may not exist. In this section, we propose a method to determine the bounds of conditional on the initial configuration .

Note that for a given , quantities like , , and are uniquely determined, and will remain unchanged throughout the rewiring process. Then by Equation (2), for ,

which, confirms that is a linear function of .

Next, we describe the procedure of finding the bounds of the four assortativity coefficients. Without loss of generality, we take as our example. The upper and lower bounds of are related to the structure of which is characterized through its corresponding and . We find the lower bound of by solving the following convex optimization problem:

s.t.

Analogously, we obtain the upper bound of by solving the optimization problem with objective function , while keeping all of the constraints unchanged. We denote the lower and upper bounds of as and , respectively.

Now suppose that and are two predetermined values such that . Then we determine the range of , given the initial configuration and . Here the extra constraint of further imposes restrictions on the possible values that can take. The associated convex optimization problem for the lower bound of then becomes

s.t.

Similarly, the upper bound of is obtained by solving the convex optimization problem with the same constraints but a different objective function . We continue in this fashion until the bounds for all four assortativity coefficients are determined.

The proposed bound computation scheme provides a flexible framework so that one may start with one arbitrary type of assortativity coefficients, depending on the information regarding the target network structure and the research problem of interest. Furthermore, the proposed scheme helps determine whether the given target assortativity coefficients are attainable simultaneously, thus providing insights on their dependence structure. Since the DiDPR algorithm outlined in Algorithm 2 tentatively costs a great deal of computation powers for mega scale networks, we suggest checking the attainability of the target assortativity coefficients before applying the algorithm.

3 Simulations

We now investigate the performance of the DiDPR algorithm through simulation studies with two widely used random network models, the Erdös–Rényi (ER) model (Erdös and Rényi, 1959; Gilbert, 1959) and Barabási–Albert model (Barabási and Albert, 1999). The latter model is also known as linear preferential attachment (PA) network model in the literature. We consider directed ER and PA models extended from their classical versions.

3.1 ER Model

A directed ER random network, , is governed by two parameters: the number of nodes and the probability of a directed edge from one node to another . We consider an extension of the traditional ER random network model allowing self-loops. In directed ER networks, all of the edges are generated independently, and due to such simplicity, a variety of properties of ER networks have been investigated analytically; see for instance, (van der Hofstad, 2017, Chapters 4 and 5). Besides, ER random networks are often used as benchmark models in network analysis (e.g., Bianconi et al., 2008; Palla et al., 2015).

We take as our initial graph with . Since the directed edges in ER networks are generated independently, large-scale ER networks are not expected to present any patterns of assortative or disassortative mixing. Therefore, all of the values in the natural bound (i.e., ) are attainable for each of the assortativity coefficients marginally. Given one of the four assortativity coefficients, however, the values that the rest can take become restricted. Without loss of generality, we investigate the bounds of , and conditional on the values of .

We generate independent ER networks with and as initial graphs. For each initial graph, we solve the corresponding convex optimization problem with given values to determine the upper and lower bounds of , and . The results are presented via box plots in Figure 1. In the rightmost panel, the lower and upper bounds of respectively remain at and regardless of changes in the values of . This suggests that the in-in degree correlation of ER networks is not affected by their out-out degree correlation, which agrees with the independence assumption made throughout the edge creation process. However, ranges of and are only when is close to . Their bounds shrink symmetrically when the value of is deviated from , since out-out assortativity coefficients with large magnitudes require a great proportion of edges linking source nodes with large (small) out-degree to target nodes also with large (small) out-degree, thus giving narrower bounds for out-in and in-out assortativity coefficients.

Figure 1: Side-by-side box plots of the upper and lower bounds of , and with given values of .

Next, we conduct a sensitivity analysis to assess the performance of the DiDPR algorithm with respect to changes in and . First, we fix , and set . The target assortativity values are given by , , and , all of which are selected arbitrarily, and their attainability has been verified through our algorithm (from Section 2.3). For each combination of and , we generate independent directed ER random networks, and present the average trace plots (of assortativity via rewiring) in Figure 2, where each iteration contains rewiring steps. All trace plots in each panel start from as ER networks are not expected to show any pattern of assortative mixing. For a fixed (in the top four panels of Figure 2), ER networks with smaller tend to arrive at the targets faster since only a small number of edges needs rewiring. On the other hand, we come up with the same conclusion according to the trace plots with fixed presented in the bottom four panels of Figure 2. Nonetheless, all of the trace plots confirm the success of the proposed algorithm.

Figure 2: Average trace plots for the assortativity coefficients of directed ER networks. The parameters are set to , for the top four figures, and to , for the bottom four panels. The dashed grays lines represent the target assortativity values.

3.2 PA Model

The PA model is a generative probabilistic model such that nodes with large degrees are more likely to attract newcomers than those with small degrees (e.g., Barabási and Albert, 1999; Bollobás et al., 2003; Krapivsky et al., 2001; Krapivsky and Redner, 2001). It is much more realistic model for many real network data than the ER model. We consider the directed PA (DPA) model given in Bollobás et al. (2003), which has five parameters subject to as explained below. Following Bollobás et al. (2003), we assume there are three edge-creation scenarios:

  1. With probability , a new edge is added from a new node to an existing node following the PA rule.

  2. With probability , a new edge is added between two existing nodes following the PA rule.

  3. With probability , a new edge is added from an existing node to a new node following the PA rule.

A graphical illustration of this evolving process is given in Figure 3. The two offset parameters control the growth rate of in- and out-degrees, respectively (Wang and Resnick, 2021b). The specific evolutionary rule of the model is given in Appendix B.

Figure 3: Three edge-addition scenarios respectively corresponding to , and (from left to right).

We provide a diagram in Figure 4 to explain how the rewiring process works for the DPA model. Suppose that two edges, and , are sampled, assuming and are created under the - and -scenarios, respectively. According to the PA rule, node tends to have large in-degree, and tends to have large out-degree. However, the two nodes, and , may have small in- and out-degrees since they are created at later stages of the network evolution. Then after a successful rewiring, we swap the edges to and , increasing the assortativity coefficients.

Figure 4: Rewiring between -scenario and -scenario edges, assuming and .

Our simulation study starts with the investigation on the lower and upper bounds of the four assortativity coefficients. We generate independent DPA networks of size with different sets of parameters, namely , , , and , , while we set throughout the simulations. Figure 5 presents the upper and lower bounds for the four directed assortativity coefficients in the DPA networks with three sets of parameters. The range for large is wider than that for small , with large (upper and lower) bound variations. For the DPA networks with , , the upper and lower bounds of , and given the value of increase as increases.

Figure 5: Side-by-side box plots of the upper and lower bounds of the assortativity coefficients of DPA networks (of size ) respectively associated with parameters ; ; . For all of the generated PA networks, and are both set to .
Figure 6: Side-by-side box plots of the upper and lower bounds of , and with given values of of DPA networks of size . The PA parameters are set to , and .

We next assess the performance of the DiDPR algorithm under DPA random networks with different combinations of while holding . Moreover, we keep the four assortativity targets as and . All of the assortativity targets are positive in response to the evolutionary feature of DPA networks, and the values are not too close to the extremes of lower/upper bounds, ensuring that they are achievable with high probabilities for different combinations of regardless of the structure of initial networks.

We also need to choose parameters in the simulation study carefully so that the message is articulated. Similar to the diagram in Figure 4, we can draw other analogous rewiring diagrams when the sampled edges are from -, -, -, - and - scenarios. After inspecting all combinations, we see that the - combination provides the greatest amount of increase in the assortativity coefficients, supported by additional simulation experiments; see Appendix C for details. Hence, our simulation design has two different settings: (1) Fix , and vary values of ; (2) Fix , and vary values of . Under the first scenario, we maximize the chance of sampling the - combination, and examine the impact of on the convergence of the assortativity coefficients. In the second circumstances, by varying the product , we investigate whether a higher chance of sampling the - combination gives faster convergence of the assortativity coefficients.

Fix .

Consider different values of , and set the corresponding such that . We do not allow to take large values in our simulations since otherwise the -scenario will dominate the network evolution, decreasing the number of nodes created during the entire network growth. Similar to the previous study, we generate independent DPA networks, and collect the assortativity coefficient values every rewiring steps in order to improve the computational efficiency as well as for better graphical representation. The average trace plots are given in Figure 7.

The DiDPR algorithm shows rapid convergence of all four assortativity measures, and the assortativity coefficients reach their targets faster for smaller values of . When is small, we have a large amount of newly generated edges connecting existing nodes with new nodes. By the PA rule, existing nodes usually have larger out- and in-degrees than newcomers. Therefore, the initial graph are more likely to contain edges connecting large out-degree (in-degree) nodes with small in-degree (out-degree) nodes. While these edges are sampled, rewiring tentatively leads to increases in assortativity values. Therefore, the proposed algorithm becomes effective and efficient. On the other hand, when is large, we expect many edges connecting existing nodes with large out- and in-degrees, and these edges are sampled with high probability. However, when two -scenario edges are sampled, the improvement in assortativity will be limited. Therefore, the assortativity coefficients in DPA networks with large require more time to attain the targets.

Figure 7: Average trace plots for four kinds of assortativity coefficients of simulated DPA networks of size with and .
Fix .

We then fix , and consider different values of and such that for . With the same target setting to and , we generate independent DPA networks, and present the average trace plots in Figure 8. Once again, the DiDPR algorithm gives fast convergence for all four assortativity coefficients. For and , they reach their corresponding targets faster when is small. Since maximizes the value of for a fixed , the fast convergence for small coincides with the earlier remark that the - sampling combination gives the largest amount of improvement in the assortativity coefficients after each rewiring attempt. The average trace plots for and , however, do not display huge discrepancies in the convergence rate under different parameter choices.

Figure 8: Average trace plots for the assortativity coefficients of simulated PA networks of size with and with .

4 Applications to Social Networks

We now apply the proposed algorithm to a Facebook wall post network with data available at KONECT (http://konect.cc/networks/facebook-wosn-wall/, Kunegis, 2013)

. We fit a DPA model to the selected network, and estimate the parameters via an

extreme value (EV) method. The fitted model well captures the features of out- and in-degree distributions of the network data, but fails to characterize the assortativity structure accurately. By applying the DiDPR algorithm, we see that four assortativity coefficients of the fitted model are close to the counterparts of the selected network.

Nodes in the Facebook wall post network correspond to Facebook users, and each directed edge represents the event that user  writes a post on the wall of user . We only use the network formed by the data from 2007-07-01 to 2007-11-30 due to the observation in Wang and Resnick (2021a) that the network growth pattern in this period is more stable than early time periods. The selection has led a network with nodes and edges.

We start by fitting a DPA model to the network. Different from the likelihood based method in Wan et al. (2017), the EV estimation method given in Wan et al. (2020) only focuses on the distribution for large in- and out-degrees as opposed to the entire network evolution history. Since our DiDPR algorithm does not change the degree distributions, we use the EV method to fit the DPA model in the first place. Overall, the EV method considers a reparametrization of the DPA model with unknown parameters , where are the marginal tail indices for the out- and in-degree distributions, respectively. By Bollobás et al. (2003), are functions of :

To implement the EV method, we first estimate by . Then to obtain the marginal out- and in-degree tail estimates, and , we consult the minimum distance method proposed in Clauset et al. (2009), which is implemented in the R package poweRlaw (Gillespie, 2015). Set , then by applying the power transformation , we see that the transformed pair will have the same marginal tail index. Next, we apply the polar transformation under the -norm to obtain

Following the methodology in Wan et al. (2020), we estimate from the empirical distribution of for which , and is typically chosen as the -th largest value of . Using , we have , , , and .

We then generate independent DPA networks of size with the estimated parameters, and overlay the marginal out- and in-degree distributions of the simulated networks and their empirical counterparts from the selected sub-network; see Figure 9. Most of the empirical out- and in-degree distributions of the real data fall within or close to the ranges formed by the simulated networks, except for in-degree . Such discrepancy is due to the fact that there exists a certain number of users who keep posting on others’ Facebook walls, but have not received any posts during the observational period. Hence, the fitted DPA model is able to capture the degree distribution in the given network, which provides the foundation for the implementation of the DiDPR algorithm.

Figure 9: Empirical out- and in-degree distributions of the selected sub-network and those from the independently generated DPA networks with estimated parameters.

Looking at the averages for the four assortativity values of the simulated networks, we have , , , and , all of which are lower than their counterparts in the empirical network, i.e., , , , and . Hence, we proceed by first using the DPA network with estimated parameters as initial configuration, then applying the DiDPR algorithm to correct the assortativity levels of the network, keeping the well fitted degree distributions unchanged. Figure 10 shows the average trace plots of the assortativity coefficients based on the simulated DPA networks, where the assortativity values of each kind are updated every rewiring steps. Figure 10 confirms that after rewiring, all of the assortativity coefficients are close to their counterparts observed from the selected sub-network, thus filling up the discrepancy in the simple DPA model.

Figure 10: Average trace plots for the assortativity coefficients of simulated DPA networks.

5 Discussion

The proposed DiDPR algorithm is efficient and effective in generating directed networks with four pre-determined directed assortativity coefficients. The fundamental step of the algorithm is to construct a directed network achieving the given assortativity coefficients, which is done by solving a convex optimization problem. This procedure complements a crucial missing component in Newman’s rewiring algorithm for undirected networks. With minor modifications, our method can identity the bounds of the assortativity coefficients by capturing the dependence structure among them. The proposed algorithm corrects all of the assortativity coefficient values simultaneously through rewiring process while preserving the original out- and in-degree distributions. The effectiveness of the algorithm is reflected through simulation studies as well as an application to Facebook wall post data.

The proposed DiDPR algorithm can be employed to adjust the assortativity coefficients defined similarly to Pearson’s correlation. For instance, van der Hoorn and Litvak (2015) proposed a rank-based assortativity coefficient analogous to Spearman’s for undirected networks. We can define four (Spearman’s ) rank-based assortativity coefficients for directed networks, where all the degree terms in Equation (2) are replaced with the corresponding ranks. Mid-rank can be used to in presence of ties. The same idea can be carried over to construct for given assortativity targets. The rewiring procedure in Algorithm 2 remains unchanged for preserving out- and in-degree distributions. Therefore, the DiDPR algorithm can be adapted to directed assortativity coefficients defined with Spearman’s straightforwardly, and potentially to other directed assortativity defined with nonparametric dependence measures such as Kendall’s .

A future direction of interest is to extend the algorithm to weighted, directed networks while preserving the strength distributions throughout rewiring. This generalization is simple for integer-valued edge weights, as it can be decomposed into multiple unit-weighted edges. Preserving node strengths of continuous type, however, remains challenging, especially when swapping two edges with different weights.

References

  • Barabási and Albert (1999) Barabási, A.-L. and Albert, R. (1999), “Emergence of Scaling in Random Networks,” Science, 286, 509–512.
  • Bertotti and Modanese (2019) Bertotti, M. L. and Modanese, G. (2019), “The Configuration Model for Barábasi-Albert Networks,” Applied Network Science, 4, 32.
  • Bianconi et al. (2008) Bianconi, G., Gulbahce, N., and Motter, A. E. (2008), “Local Structure of Directed Networks,” Physical Review Letters, 100, 11.
  • Bollobás et al. (2003) Bollobás, B., Borgs, C., , Chayes, J., and Riordan, O. (2003), “Directed Scale-Free Graphs,” in SODA ’03: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA: SIAM, pp. 132–139.
  • Boyd and Vandenberghe (2004) Boyd, S. and Vandenberghe, L. (2004), Convex Optimization, Cambridge, U.K.: Cambridge University Press.
  • Chang et al. (2007) Chang, H., Su, B.-B., Zhou, Y.-P., and He, D.-R. (2007), “Assortativity and Act Degree Distribution of Some Collaboration Networks,” Physica A: Statistical Mechanics and Its Applications, 383, 687–702.
  • Clauset et al. (2009) Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2009), “Power-law distributions in empirical data,” SIAM Review, 51, 661–703.
  • Domahidi et al. (2013) Domahidi, A., Chu, E., and Boyd, S. (2013), “ECOS: An SOCP Solver for Embedded Systems,” in 2013 European Cotrol Conference (ECC), Piscataway, NJ, USA: IEEE, pp. 3071–3076.
  • Erdös and Rényi (1959) Erdös, P. and Rényi, A. (1959), “On Random Graphs I,” Publicationes Mathematicae Debrecen, 6, 290–297.
  • Foster et al. (2010) Foster, J. G., Foster, D. V., Grassberger, P., and Paczuski, M. (2010), “Edge Direction and the Structure of Networks,” Proceedings of the National Academy of Sciences of the United States of America, 107, 10815–10820.
  • Fu et al. (2020) Fu, A., Narasimhan, B., and Boyd, S. (2020), “CVXR: An R Package for Disciplined Convex Optimization,” Journal of Statistical Software, 94, 1–34.
  • Gilbert (1959) Gilbert, E. N. (1959), “Random Graphs,” Annals of Mathematical Statistics, 30, 1141–1144.
  • Gillespie (2015) Gillespie, C. S. (2015), “Fitting Heavy Tailed Distributions: The poweRlaw Package,” Journal of Statistical Software, 64, 1–16.
  • Holme and Zhao (2007) Holme, P. and Zhao, J. (2007), “Exploring the Assortativity-Clustering Space of a Network’s Degree Sequence,” Physical Review E, 75, 046111.
  • Kashyap and Ambika (2017) Kashyap, G. and Ambika, G. (2017), “Mechanisms for Tuning Clustering and Degree-Correlations In Directed Networks,” Journal of Complex Networks, 6, 767–787.
  • Krapivsky and Redner (2001) Krapivsky, P. L. and Redner, S. (2001), “Organization of Growing Random Networks,” Physical Review E, 63, 066123.
  • Krapivsky et al. (2001) Krapivsky, P. L., Rodgers, G. J., and Redner, S. (2001), “Degree Distributions of Growing Networks,” Physical Review Letters, 86, 5401–5404.
  • Kunegis (2013) Kunegis, J. (2013), “KONECT: The Koblenz Network Collection,” in WWW ’13 Companion: Proceedings of the 22nd International Conference on World Wide Web, eds. Schwabe, D., Almeida, V., and Glaser, H., New York, NY, USA: Association for Computing Machinery, pp. 1343–1350.
  • Leung and Chau (2007) Leung, C. C. and Chau, H. F. (2007), “Weighted Assortative and Disassortative Networks Model,” Physica A: Statistical Mechanics and Its Applications, 378, 591–602.
  • Litvak and van der Hofstad (2013) Litvak, N. and van der Hofstad, R. (2013), “Uncovering Disassortativity in Large Scale-Free Networks,” Physical Review E, 87, 022801.
  • Newman (2002) Newman, M. E. J. (2002), “Assortative Mixing in Networks,” Physical Review Letters, 89, 208701.
  • Newman (2003) — (2003), “Mixing Patterns in Networks,” Physical Review E, 67, 026126.
  • Noldus and van Mieghem (2015) Noldus, R. and van Mieghem, P. (2015), “Assortativity in Complex Networks,” Journal of Complex Networks, 3, 507–542.
  • Palla et al. (2015) Palla, G., Farkas, I. J., Pollner, P., Derényi, I., and Vicsek, T. (2015), “Directed Network Modules,” New Journal of Physics, 9, 186.
  • Piraveenan et al. (2012) Piraveenan, M., Prokopenko, M., and Zomaya, A. (2012), “Assortative Mixing in Directed Biological Networks,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, 66–78.
  • Uribe-Leon et al. (2021) Uribe-Leon, C., Vasquez, J. C., Giraldo, M. A., and Ricaurte, G. (2021), “Finding Optimal Assortativity Configurations in Directed Networks,” Journal of Complex Networks, 8, cnab004.
  • van der Hofstad (2017) van der Hofstad, R. (2017), Random Graphs and Complex Networks, Cambridge, UK: Cambridge University Press.
  • van der Hoorn and Litvak (2015) van der Hoorn, P. and Litvak, N. (2015), “Degree-Degree Dependencies in Directed Networks with Heavy-Tailed Degrees,” Internet Mathematics, 11, 155–179.
  • Wan et al. (2017) Wan, P., Wang, T., Davis, R. A., and Resnick, S. I. (2017), “Fitting the Linear Preferential Attachment Model,” Electronic Journal of Statistics, 11, 3738–3780.
  • Wan et al. (2020) — (2020), “Are Extreme Value Estimation Methods Useful For Network Data?” Extremes, 23, 171–195.
  • Wang and Resnick (2021a) Wang, T. and Resnick, S. I. (2021a), “Common Growth Patterns for Regional Social Networks: A Point Process Approach,”

    Journal of Data Science

    , https://doi.org/10.6339/21–JDS1021.
  • Wang and Resnick (2021b) — (2021b), “Measuring Reciprocity in a Directed Preferential Attachment Network,” Advances in Applied Probability, To appear.
  • Yan et al. (2021) Yan, J., Yuan, Y., and Zhang, P. (2021), wdnet: Weighted Directed Network, University of Connecticut, R package version 0.0-3, https://gitlab.com/wdnetwork/wdnet.
  • Yuan et al. (2021) Yuan, Y., Yan, J., and Zhang, P. (2021), “Assortativity Measures for Weighted and Directed Networks,” Journal of Complex Networks, 9, cnab017.

Appendix A Interface with Cvxr Package

As mentioned, we use the utility functions from CVXR package to solve the optimization problem defined in Section 2.2

. The linear constraints of those functions are represented by vectors and matrices in the description file of

CVXR. We hence write the constraints for our optimization problem in the form of matrices as well. Let and respectively be the collection of distinct out-degree and in-degree values in . Recall that we use to represent the probability that an edge emanates from a source node of out-degree . In what follows, let the -long vector denote the empirical out-degree distribution for source nodes, denote the empirical out-degree distribution for target nodes, where is the cardinality of vector , and is the transpose of . In what follows, we define and in a similar manner. Consider two design matrices respectively given by and , where is a identity matrix, is an -long column vector consisting of all ones, and represents Kronecker product. Lastly, we use to denote a matrix, each column of which is . Analogously, is defined as a matrix that is composed of ’s.

For the sake of implementation, we arrange all the quantities in in the following matrix:

which is of dimension . We are now ready to rewrite the constraints for our convex optimization problems as follows: