Dynamic Network Prediction

12/13/2018 ∙ by Ravi Goyal, et al. ∙ Mathematica Policy Research 0

We present a statistical framework for generating predicted dynamic networks based on the observed evolution of social relationships in a population. The framework includes a novel and flexible procedure to sample dynamic networks given a probability distribution on evolving network properties; it permits the use of a broad class of approaches to model trends, seasonal variability, uncertainty, and changes in population composition. Current methods do not account for the variability in the observed historical networks when predicting the network structure; the proposed method provides a principled approach to incorporate uncertainty in prediction. This advance aids in the designing of network-based interventions, as development of such interventions often requires prediction of the network structure in the presence and absence of the intervention. Two simulation studies are conducted to demonstrate the usefulness of generating predicted networks when designing network-based interventions. The framework is also illustrated by investigating results of potential interventions on bill passage rates using a dynamic network that represents the sponsor/co-sponsor relationships among senators derived from bills introduced in the US Senate from 2003-2016.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Complex social systems in which individual-level outcomes of interest are interdependent are increasingly represented as networks. In some systems, such as those involving transmission of sexual diseases, the dependencies among people–represented as nodes in a network–are not permanent, but form and dissolve over time, leading to changes in the network topology. The evolving structure of the network can influence the efficiency of processes operating within the system (Morris & Kretzschmar, 1997). In this report, we present an approach to predict the topological evolution of the network based on observed historical network data.

The primary advance in our proposed approach is a novel method to sample dynamic networks from a broad class of probability distributions. The method allows investigators to model long-term and seasonal trends in the evolution of the network structure and to use these models to predict networks in ways that incorporate uncertainty in the topology of the predicted networks. To our knowledge, current methods do not allow estimates of the level of uncertainty in the predicted network structure to be based on the variability in the observed networks. Temporal exponential random graph models (TERGMs) and separable temporal exponential random graph models (STERGMs)–common approaches to modeling temporal network data–are flexible in the way they incorporate network properties, but their parameterization accommodates only point estimates of network property values and not their variances

(Hanneke & Xing, 2007; Hanneke et al., 2010; Krivitsky & Handcock, 2013). Therefore, these models are not able to incorporate the observed historical variability of network properties and hence are of limited use for predicting network structure.

Significant methodological challenges exist in designing interventions that modify network structure. A primary challenge is the lack of general theory that connects network properties to outcomes of network processes. Considerable research has been devoted to investigating this relationship; however, the focus has mostly been on static networks. Pellis et al. (2015) commented on the need for additional research on dynamic networks in the context of epidemiological investigations. In the absence of theory, modeling time trends in outcomes requires modeling the entire evolution of the network. Therefore, assessing the potential impact of an intervention requires prediction of the network structure in the presence and absence of the intervention. Such an approach was used in the design and monitoring of a large randomized community controlled trial, the Botswana Combination Prevention Program (BCPP), which investigates whether implementation of a combination of prevention interventions reduces HIV incidence (Wang et al., 2014).

Our proposed approach allows investigators to predict the dynamic network structure using historical data on the network prior to the implementation of the intervention. It also enables investigators to adjust the probability distribution of network properties targeted for modification by the intervention in order to generate predicted networks in its presence. Comparing results from simulations modeling processes operating on networks in the presence and absence of an intervention allows for the evaluation of its potential impact.

Our proposed method is described in sections 2 to 4 and applied to simulation studies and an analysis of an observed network in sections 5 and 6, respectively. Specifically, section 2 introduces network terminology, section 3 provides a conceptual framework for generating predicted dynamic networks, and section 4 provides details of the method to sample dynamic networks. Section 5 demonstrates how generating predicted dynamic networks is useful in the investigation of interventions designed to modify network properties. Section 6 illustrates the method through an analysis of a dynamic network that represents the sponsor/co-sponsor relationships among senators indicated from bills introduced in the US Senate from 2003-2016. The network structures for the Senates (2003-2014) are used to predict the dynamic network structure for the Senate (2015-2016). Section 6 also demonstrates the usefulness of the method in predicting hypothetical effects of interventions. Section 7 discusses the limitations of the procedure and suggestions for further research. An R library to use the proposed methods is available by request.111The currently available R library CCMnet on CRAN will be updated to include the presented methods.

2 Network Terminology

We represent a population and connections among its members at time as a network, denoted as , where the sets and represent individuals and their connections at time . The network can be equivalently represented as a binary adjacency matrix with dimensions equal to the size of the set ; therefore, has dimensions , where denotes the size of set . Let indicate that there is a relationship between individuals and at time , i.e., , while indicates that there is no relationship, i.e., . Let be the individuals with connections to individual , i.e., . Let be the entire space of networks with as nodes.

Mixing patterns describe the tendency for individuals in networks to be connected to others that are like (or unlike) them based on their particular characteristics; we consider only discrete characteristics. Let

represent the vector of discrete characteristics for individual

in network . We consider only a single characteristic, political party affiliation; but the methods, formulas, and code in the CCMnet package permit investigation of multiple characteristics. Let be a vector containing the characteristics of all individuals. The characteristic distribution, denoted as , is a vector representing the number of individuals with these characteristics over all individuals; the entry represents the number of individuals having characteristic , i.e., .

We consider a discrete-time dynamic network model in which the network at time is a single draw from a probability distribution that conditions on the networks at times , denoted as where

is a random variable with support

. The probability distribution is based on network properties that characterize salient features of the evolving network. We denote a collection of network properties of a system as essential if the collection cannot be reduced and still adequately characterize the system. Although there have been recent methodological advances to assess whether a network model adequately characterizes a system (Hunter et al., 2008; Hanneke et al., 2010; Schweinberger, 2012), additional research in this area is still needed; in practice, the assessment may require simulation studies and guidance from subject matter experts.

Define to be the function that maps to the values of the essential network properties conditional on . Let , the inverse image of the function of , i.e., ; we refer to as a congruence class of for the specified essential network properties. Let be the probability distribution of essential network property values where is a random variable for the vector of real values that are associated with the congruence classes of . The relationship between and is shown below:

(1)

3 Dynamic Network Prediction Framework

The proposed network prediction method has three components. The first identifies essential network properties by defining the mapping , where is the number of essential network properties. As discussed in Krivitsky & Handcock (2013), it may be useful to specify two types of essential network properties; one of which aids in characterizing the cross-sectional properties of the network and the other, its longitudinal properties. We follow this approach and refer to cross-sectional and longitudinal properties as static and dynamic essential network properties, respectively. The former consists of properties that are calculable based only on ; the latter are those that require previous networks to be calculated. Let and denote the mapping from to the values of the static and dynamic essential network properties, respectively, where denotes the number of previous networks necessary to compute the dynamic essential network properties. Note that .

The benefit of specifying two types of essential network properties can be illustrated using the US sponsor/co-sponsor network. Static properties provide information on the number of relationships between US Senators at each time point, for example during January 2016; whereas dynamic properties provide information on the number of relationships in January 2016 that persisted to February 2016, i.e., the rate of evolution in the dynamic network. Dynamic properties can capture the rate of evolution in the system–not only overall, but also with regard to specific types of relationships (e.g., relationships between members of the same political party or between members of different parties).

Recent statistical advances in dynamic networks complement our proposed method as they can be used to guide selection of the essential network properties (Hanneke & Xing, 2007; Hanneke et al., 2010; Krivitsky & Handcock, 2013; Paul & O’Malley, 2013; Snijders, 1996). Also of importance is recent work on assessing goodness-of-fit (GOF) for static networks (Hunter et al., 2008) and for temporal networks (Hanneke et al., 2010; Schweinberger, 2012). These advances can aid in identifying and validating the selected set of essential network properties; as mentioned above, additional research is needed in the area of assessing GOF. We use the method as proposed in Hanneke et al. (2010) to assess GOF for the US Senate Bill data.

The second component is modeling and predicting essential network properties, i.e., specifying . The framework presented in this paper provides the flexibility needed to specify the probability of observing a network with particular values for the selected essential network properties using a range of techniques, including techniques for modeling evolving trends and seasonal variability.

The first component (identifying essential network properties) does not provide a probability of observing a network at time , but only specification of the properties that are used to compute that probability. Therefore, the method requires two distinct decisions about the use of prior networks for modeling dynamic networks under this framework. The first is the number of prior networks that are used to define the dynamic essential network properties. Using only the previous network would imply that whether an edge persists in time does not depend on whether the edge was present at time

(or any other earlier time). The second is the collection of observed historical networks to use to estimate the joint distribution of essential network properties at time

, i.e., estimate . The choice of which prior networks to use for each of these two decisions can differ. For example, we might assume that the persistence of a relationship between two US Senators at time only depends on whether the relationship was present at time (not on earlier networks); nonetheless, the estimate for the number of relationships at time and of the subset of relationships that existed at both times and can be based on historical averages dating back years or decades.

The third component is generation of networks according to the probability distribution , which is based on the predicted distribution of the essential network properties, ; this relationship is shown in equation (1). In essence, this component maps back to the network space from by sampling networks from the probability distribution . The three components of this framework is illustrated in figure 1.

Figure 1: A conceptual illustration of the prediction framework consisting of three components: identify essential network properties, forecast properties, and generate predicted networks.

4 Dynamic Congruent Class Model

4.1 Congruence Class Model

To maximize the flexibility of the methods used to estimate the predictive distribution for the network properties, we propose a general procedure to generate networks based on a model by Goyal et al. (2014); we refer to it as the Congruence Class Model (CCM). We extend the CCM, a static network generation method, to dynamic networks and refer to this extension as the Dynamic Congruence Class Model (DCCM). The CCM as well as the DCCM, allow investigators to generate networks consistent with a broad class of probability distributions on essential network properties. Below we review the key concepts of the CCM.

The CCM partitions the space of graphs with nodes, , such that all graphs in a partition have the same values of essential network properties. As defined in Goyal et al. (2014), the partitions are referred to as congruence classes. A congruence class is defined as , where denotes the vector of values for the essential network properties for graph , i.e., , and denotes the number of essential network properties. The probability distribution on for the CCM requires specification of , the probability mass function for the congruence classes defined by the essential network properties; as mentioned above, is the total probability of all networks that are elements in , i.e.,

(2)

Since the congruence classes represent the partition of the space based on essential network properties, two networks within a congruence class must have the same probabilities of being observed. Therefore, the probability distribution on for the CCM is the following:

(3)

where denotes the number of networks with essential property values equal to .

The flexibility of the CCM results from the fact that the investigator can choose the probability mass function on congruence classes, . The CCM allows a broad range of models, including both parametric and nonparametric, in the assignment of the probability mass function to the defined congruence classes.

4.2 Dynamic Congruent Class Model

In the DCCM, the congruence classes presented below are defined by both the static and dynamic essential network properties; by contrast, the CCM is based only on the former. We denote as when it is necessary to separate the vector of values for the static and dynamic essential network properties.

(4)

Adapting equation (3) for the DCCM, the probability mass function on the space , is the following:

(5)

In the following sections, the congruence class of a network is restricted so that it only depends on the previous network, . Therefore, the probability mass function in equation (5) simplifies to the following:

(6)

The decision to assume a Markov process in defining the dynamic essential network properties does not restrict the collection of networks used to estimate the probability mass function on congruence classes, . This flexibility allows the model to incorporate long-term and seasonal trends as well as degrees of uncertainty that vary over time based on the historical data.

As closed form expressions for are not available, sampling from according to the probability mass function in equation (6

) is performed by using a Metropolis-Hastings algorithm (MH)–a type of Markov Chain Monte Carlo (MCMC) procedure. To generate the network at the

step, , the algorithm starts by proposing a network, , based on the current state of the MCMC algorithm and denoted as by toggling the existence of an edge (on or off) in . If the proposed network is accepted, based on equation (7) below, the algorithm sets the current state and uses as the basis of the next proposal; otherwise it remains on the current state and uses as the basis of the next proposal. The algorithm continues for a set number of proposals and the final element of the chain is assigned to . The algorithm produces an irreducible Markov chain among all graphs in . The equation for the acceptance probability for the MH algorithm is the following:

(7)

where is the average number of elements in that are valid proposals from an element . The non-standard acceptance probability formula in the MH algorithm arises because the DCCM’s focus on congruence classes. Equation (7) is identical to the acceptance probability derived in Goyal et al. (2014) except for the modification to the definition of the congruence classes that permits inclusion of dynamic essential network properties.

5 Simulation Studies

The usefulness of the proposed approach lies in its ability to predict networks–not simply collections of network properties. In this section, we demonstrate the value of generating predicted dynamic networks for evaluation of interventions intended to modify network topology. The two simulation studies we discuss show that the association between dynamic network properties and outcomes can be complex even in simple settings. Both studies present interventions that are focused on mitigating the spread of an infectious disease. However, the examples are general enough to represent complex systems across many settings.

5.1 Simulation Study 1

The simulations in this section mimic interventions intended to decrease the number of contacts during an epidemic of a communicable disease; the simulations use a simple susceptible-infected (SI) epidemic model for disease spread. Each simulation models a population of 1000, where initially five individuals are infected with the disease. At each time step individuals form new contacts, dissolve existing contacts, and may spread the disease to uninfected contacts. The DCCM is used to control the formation and dissolution of contacts. The model used a single static essential network property, number of edges, and a single dynamic essential network property, number of edges persisting between two time points.

Six interventions are investigated in 6 simulations; the only difference among them is the rate at which the number of contacts in the population decreases. A seventh simulation, in which the mean number of contacts does not decrease, represents the absence of an intervention. At the start of each simulation, the contact networks have a mean of 1500 edges. At the end, the mean number of edges for each of the six intervention simulations are 0, 250, 500, 750, 1000, and 1250; for the seventh, it remains at 1500 edges. For all simulations, an average of 90% of the edges persisted between consecutive networks. The variance for the number of edges was based on the assumption that each had an equal probability of forming; for the number of edges persisting variance was based on assuming all edges had an equal probability of dissolving.

The thin lines in the left panel of figure 2 show the number of edges over time for all of the simulations; each of the seven settings was simulated 20 times. The seven thick lines show the average number of edges over time for each of the settings. For each setting, the average line and the variability around it indicate that the proposed method is performing as expected in modeling the static essential network property; diagnostic plots (not shown) demonstrate that the proposed method also is correctly modeling the dynamic essential network property. The thin lines in the right panel of figure 2 show the number infected over time for all of the simulations; similarly, the seven thick lines show the average number infected over time for each of the settings.

The association between the essential network properties and the number infected does not lend itself to the identification of a precise mathematical relationship. The curves shown in the right panel all have a slightly different shape, leading to difficulty in specifying a precise mathematical relationship between the network topology and the cumulative infected over time prior to conducting this simulation study. However, after conducting these simulations, an investigator would be able to assess the potential impact of the six interventions by comparing the results of each of the interventions (settings 1-6) to the simulation modeling the absence of the intervention (setting 7). This comparison is possible due to the ability to generate whole networks.

Figure 2: Example 1. The thin lines in the left panel of figure 2 show the number of edges over time for all of the simulations; each of the seven interventions was simulated 20 times; the seven thick lines show the average number of edges over time for each of the interventions. The thin lines in the right panel of figure 2 show the number infected over time for all of the simulations; similarly, the seven thick lines show the average number infected over time for each of the interventions.

5.2 Simulation Study 2

The simulations in this section represent interventions to control a communicable disease epidemic by decreasing the cumulative number of contacts while keeping constant the total length of time in relationships. This study setup is similar to that of the previous study as it models a population of 1000, of whom five individuals are initially infected, assumes the same essential network properties, and uses an SI epidemic model to simulate the disease spread.

This study considers a set of interventions that impact the probability that an edge persists between two time points; this probability ranges from 0 to 1. Without loss of generalizability, we assume that absent an intervention, the probability is zero. Throughout each simulation, the contact networks have a mean of 800 edges; the variance for the essential network properties was based on the same assumptions as the previous simulation study. Figure 3 depicts the total number infected for each simulation for varying values of the probability that an edge persists between two time points. As in the first study, it would be difficult to derive a precise mathematical description of the relationship between these two quantities. However, from a simulation study, an investigator would be able to assess the potential impact of an intervention that was designed to decrease the cumulative number of contacts by comparing the results for each of the interventions to the that representing the absence of the intervention, i.e., the simulation where the probability that an edge persists between two time points is set to zero. As was the case in the first simulation, this comparison is possible due to the ability to generate whole networks.

Figure 3: Example 2. The total number infected for each simulation for varying values of the probability that an edge persists between two time points.

6 Senate Bills 2003-2016

The longitudinal network data represent relationships between US Senators as derived from bills introduced during the Unites States Senate. Each bill introduced in the US Senate has a single senator who serves as the sponsor of the bill; other senators may be associated with the bill as co-sponsors. A network for month is generated by forming an undirected edge between the sponsoring senator and each of the co-sponsoring senators for bills introduced in month .

We use the bills introduced during the Senate to predict the networks during the Senate. The next three subsections follow the conceptual framework outlined in section 3 and figure 1. Sections 6.1 and 6.2 identify and predict the essential network properties; section 6.3 generates networks based on predicted estimates from the model developed in section 6.2. Section 6.4 investigates the GOF of the model to assess whether the modeled properties are sufficient to characterize the data. Section 6.5 demonstrates the usefulness of the method in predicting hypothetical effects of an intervention.

6.1 Identifying essential network properties

A salient feature in formation of collaborations among US Senators is party affiliation (Hanneke et al., 2010). We model three static essential network properties that capture mixing patterns between the two major political parties Democratic and Republican; senators designated as independent or socialist were assigned as Democrats. As the total number of senators fluctuated over the time intervals (e.g., Illinois only had one senator during December 2008) as did the number affiliated with each political party, we model the properties in a way that is compatible with these data. The static essential network properties we model are the average number of edges that link senators according to party affiliation as defined below:

(8)
(9)
(10)

where vector represents the political affiliation for each node in , i.e., is the political affiliation for node (D for Democrat and R for Republican), and and are the number of Democrat and Republican senators, respectively, in network . Let .

The vector for dynamic essential network properties, , describes the number of shared edges between and for each pair of political affiliations, (D,D), (D,R), and (R,R); the reason for specifying three dynamic essential network properties as opposed to one is to avoid “churn”, as described in Krivitsky & Handcock (2013). The term consists of the average number of common edges in and that link: 1) a Democrat to another Democrat, 2) Democrat to a Republican, and 3) Republican to another Republican and denoted as , , and , respectively. The formulas for dynamic essential network properties are presented below:

(11)
(12)
(13)

Let . The black lines in the top three plots of figure 4 depict the values of , while the bottom three plots depict for the Senates. We excluded the dynamic essential network property values where the months and are associated with different senate terms, and set the value to zero in figure 4 to retain the same time scale as the static essential network properties.

Figure 4: US Senate Network Statistics. The black lines depict the values of for the Senate. The shaded sections represent the , , and Senates, while the non-shaded sections represent the , , and .

6.2 Predicting Network Statistics

We develop a model to predict for the Senate using data from the Senates. The prediction model is used to specify for . Let be the vector of random variables associated with the static and dynamic essential network properties; is comprised of the three random variables for the statistic properties, denoted as , and , and three random variables for the dynamic properties, denoted as , and .

An advantage of the DCCM is that the development of the prediction model for does not require the Markov assumption used in defining the dynamic essential network statistics; we use all of the historical networks from the Senates, and denote this collection as . We base our predictions of each component of on an autoregressive moving average (ARMA) model with a seasonal component in order to capture the periodic fluctuations of the network statistics associated with the congressional election cycle. The seasonal model for has the following form:

(14)

where

(15)
(16)
(17)
(18)
(19)
(20)

A separate model–a model with p=3 autoregressive terms and q=1 moving-average term and a seasonal component with P=2, Q=1 and period of s=24 months–was used to model each of the static essential network properties, , and . For the dynamic essential network properties, , and , we fit the same model except the seasonal component had a period of s=23 months; we excluded the values where the months and are associated with different senate terms. In order to combine these separate models, we use the product distribution. Therefore,

(21)

where and are based on an and models, respectively, and . The models were fit using the R package Forecast (Hyndman, 2013)

. Based on the ARMA model, the predicted distribution for each essential and stability network property follows a normal distribution. Therefore,

can be represented as the following multivariate normal distribution:

(22)

For purposes of illustration, we set

such that two standard deviations cover 50% of the prediction interval; using a

for which two standard deviations cover 95% of the prediction interval would have large uncertainty and therefore reduce the clarity of the figures (no modification of the method is required to use other standard deviations, such as ones which would cover 90% or 95% of the prediction interval). In figure 5, the areas defined by the blue regions represent the revised prediction intervals.

As our framework places only minimal restrictions on the selection of model for , an investigator could select the most appropriate model, such as a vector autoregressive (VAR) or non-Gaussian models without modification to the method. We chose simple models for clarity.

Figure 5: US Senate Network Statistics. The black lines depict the values of for the Senate. The shaded sections represent the , and Senates, while the non-shaded sections represent the and Senates. The areas defined by the blue regions represent the predicted intervals.

6.3 Generation of Predicted Networks

6.3.1 Overview

This section describes the generation of networks that represent the predicted sponsor/co-sponsor relationships between senators for the Senate for the months January, 2015 to December, 2016. Using equation (6), the probability distribution for the predicted networks, , is the following:

(23)

where is estimated in the previous section.

6.3.2 Results

The procedure described in the Appendix was used to generate dynamic networks predicting the evolution of the co-sponsor relationships for bills introduced from January 2015 to December 2016, i.e., the Senate. Each dynamic network is comprised of 24 static networks–one for each month. The procedure was repeated 500 times. Let denote the generated predicted dynamic network, where represents the network at time January 2015,, December 2016. To evaluate the procedure, the predicted dynamics networks generated by our model are compared to the estimated probability distribution of essential network properties, i.e., , shown in equation (22). To conduct this evaluation, we calculate and for all of the generated predicted dynamic networks. Let denote a vector for at time for all predicted dynamic networks, , i.e.,

(24)

Similarly, define and .

The red region in the top plot of figure 6

represents 2.5% and 97.5% quantiles of

at each time point January 2015,, December 2016. The five subsequent plots represent 2.5% and 97.5% quantiles of and , respectively. The blue regions in figure 5, display the 2.5% and 97.5% quantiles of the network property values based on the estimated probability distribution of essential network properties, i.e., . The red regions figure 6 and blue regions in figure 5 are nearly identical. Therefore, figure 6 provides evidence that networks generated by the proposed method are appropriate; this result is expected as we were able to calculate exactly. See Appendix for details.

Figure 6: US Senate Network Statistics. The black lines depict the values of for the Senate. The shaded sections represent the , , and Senates, while the non-shaded sections represent the and Senates. The areas defined by the red regions represent the 2.5% and 97.5% quantiles of applied to the predicted dynamic networks.

6.4 Goodness-of-Fit

Hanneke et al. (2010) proposed an extension of the approach by Hunter et al. (2008)

to evaluate goodness-of-fit heuristically; we use their method to assess the fit of the predicted networks.

Hunter et al. (2008) and Hanneke et al. (2010) used the same networks to build a model and assess its fit. As our focus is on forecasting networks, there are challenges in assessing goodness-of-fit. In our analysis, however, the predicted Senate sponsorship networks are actually fully observed, but were excluded from our modeling; therefore, we are able to base our GOF on a comparison of the true networks to those we predicted. We consider the approach particularly useful since poor fit can arise from either 1) important essential network properties that are missing in the model or 2) a network structure of the Senate that is fundamentally different from the previous Senates.

Figure 7 shows the values of four additional network properties that were not explicitly modeled: number of triangles (), number of 2-stars (), number of 3-stars (), and alternating k-stars (). The expressions for these four network properties are:

(25)
(26)
(27)

The black line in each plot of figure 7 depicts the observed values for a network property for the Senate. Each red region represents the 2.5%-97.5% quantiles of the network statistics calculated from the simulated predicted networks. The first plot provides the number of triangles; the bottom three plots are related to degree distribution. The number of triangles, 2-stars and 3-star statistics from the simulated predicted networks appear to fit the observed network statistics closely, except for the early months of 2015. The number of relationships in the network during the early months of 2015 were higher than historical averages, which may indicate that the lack of fit was due to a change in the network structure for the Senate compared to prior terms. We note that in 2015, control of the Senate shifted from Democratic to Republican for the first time since 2006. There seems to be a good fit with the alternating k-star property.

Figure 7: Goodness-of-fit Plots. The black lines depict the values for the Senate of the following network properties: number of triangles, number of 2-stars, number of 3-stars, and alternating k-stars. The blue regions represent the 2.5%-97.5% quantiles of network statistics calculated from the simulated predicted networks.

6.5 Simulation of proposed intervention

Several political analysts have proposed that the increasing use of gerrymandering at the state level to create congressional districts that favor the party in power have decreased bipartisanship (Enten, 2018). Enten state that “Gerrymandering contributes to issues like the drop in competitive elections, extremism and gridlock, but it’s far from their sole cause.” He goes on to state: “What’s behind the disappearance of so many competitive districts? Gerrymandering is part of the story…It’s clear that most redistricting schemes that ignore politics and race would yield more competitive U.S. House districts–i.e., those with a partisan lean of 10 percentage points or less–than we currently have.” He also quotes John Kasich in his 2016 state address: “Ideas and merits should be what wins elections, not gerrymandering. When pure politics is what drives these kinds of decisions, the result is polarization and division. I think we’ve had enough of that. Gerrymandering needs to be [in] the dust bin of history.” While some states, such as California have anti-gerrymandering laws, there is no such federal law in the United States.

Political polarization, resulting in part from gerrymandering, has been proposed as a cause of congressional gridlock that has become the norm over the past several terms (Jacobson, 2016). Degree of bipartisanship can be measured as the level of bipartisan support of bills–specifically the number of ties between senators of different parties in the sponsor/co-sponsor networks, i.e., values for . Figure 8 shows a direct association between and the proportion of bills introduced in the Senate that passed the Senate (a univariable regression has a p-value of 0.0225).

Figure 8: Passage of Bills. The values for the are shown on the x-axis; color coded by congressional term. The y-axis shows the proportion of bills introduced in the Senate that passed the Senate by month. The blue line shows the loess curve.

As passage of bills in the Senate is required for new laws and the operation of US government; it is of interest to investigate consequences of decreased bipartisan co-sponsoring of bills on their probability of passage. We consider three hypothetical scenarios starting at the beginning of the Senate. The first assumes that the essential network properties for the Senate follow the prediction model shown in equation (22). The second and third make the same assumption, except that the number of across-party relationships is reduced by 95% (perhaps resulting from impact of factors like social media) and increased by 100%. The comparison of these three scenarios provides an estimate of the impact on bill passing rates of increases or decreases in bipartisan support compared to historically observed trends. Alternative scenarios with varying parameter choices would be easy to investigate.

Predicting bill passage rates under these three scenarios requires two statistical models. The first links covariates, including network properties, to the outcome of bill passage rates. These rates may depend on lower level network properties, such as the amount of across-party relationships, as well as higher order properties, such as centrality measures. In order to investigate this each month, we developed a basic random forest model using the network properties modeled in the previous section as well as variables for the number of components, size of the largest component, eigenvalue centrality, the maximum value for closeness and betweenness, and the number of individuals from each party. We use this model to illustrate the proposed framework and acknowledge that additional research and modeling is necessary to improve accuracy of prediction. The node purity metric, which indicates the importance of a variable in a random forest model, shows that eigenvalue centrality has an importance similar to that of the amount of across-party relationships, which provides support for the notion that higher order network properties impact bill passage rates in the US Senate. This finding is in line with previous research that has demonstrated correlation of centrality measures of the sponsor/co-sponsor network with the number of amendments and the associated bills that a senator will pass

(Fowler, 2006).

The second model predicts values of covariates that are included in the first model. Therefore, we need to predict values for the network properties described in equations (8) to (13) as well as the number of components, size of the largest component, eigenvalue centrality, and the maximum value for closeness and betweenness; the number of individuals from each party is based on counts at the start of the Senate. For all three scenarios, the predicted values for network properties described in equations (8) to (13) are either specified by the prediction model developed in section 6.2 or through the assumptions of the scenarios. However, it is difficult to estimate the remaining properties. Therefore, we use our framework to generate networks and use them to estimate the remaining properties. Figures 9, 10, and 11 show that the proposed method can generate networks under the three scenarios. Note, for the first scenario, is it possible to predict the remaining properties by developing a time series models for each property.

Figure 9: Scenario 1. The black lines depict the values of for the Senate. The red shaded sections represent the , and Senates, while the non-shaded sections represent the and Senates. The areas defined by the red regions on the green shaded section represent the predicted number of edges for the unobserved Senate under scenario 1.

Figure 10: Scenario 2. The black lines depict the values of for the Senate. The red shaded sections represent those quantities for , and Senates, while the non-shaded sections represent them for and Senates. The areas defined by the red regions on the green shaded section represent the predicted values for the unobserved Senate under scenario 2.

Figure 11: Scenario 3. The black lines depict the values of for the Senate. The red shaded sections represent those quantities for , and Senates, while the non-shaded sections represent them for and Senates. The areas defined by the red regions on the green shaded section represent the predicted values for the unobserved Senate under scenario 3.

Applying the first prediction model using the estimated covariates, we predict that the average monthly pass rate would decline by 3.9% in scenario 2 compared to scenario 1 and increase by 3.6% for scenario 3 compared to scenario 1. These results imply a modest change in bills passing the Senate if bipartisan support erodes or increases faster than predicted according to historically observed trends.

7 Discussion

The proposed framework for predicting dynamic network allows investigators to flexibly model the joint distribution of essential network properties at time based on previously observed networks. This flexibility permits the use of a broad class of approaches to model trends, seasonal variability, uncertainty, and changes in population composition. The flexibility makes the method particularly well suited to serve as a basis for designing potential interventions that modify network topology as investigators are able to model changes in network properties that result from interventions and compare these changes to those based on historical network trends–as in our illustration.

In addition to the application, there are a range of research areas where the proposed method is applicable. In particular, we see potential for the method to investigate the impact of inventions, such as treatment and behavior changes, to mitigate the spread of diseases–for example in investigating the impact of reducing sexual partner concurrency to reduce the spread of HIV. Reducing concurrency is tantamount to reducing the degree of individuals in a sexual contact network below 2 at a point in time. Assessing the impact of concurrency is challenging because modifying one network property (degree in this example) will modify others as well, including higher order network properties (Goyal & De Gruttola, 2015), and these properties have been shown to impact disease spread (Pastor-Satorras et al., 2015). Therefore, as with the Senate example, it is necessary to generate the entire network and not just summary statistics; estimates of higher properties cannot be computed easily.

The impact of uncertainty in network property estimates associated with dynamic networks has received little attention compared to other areas of network science. However, the existence of sharp thresholds in relationships among properties for static networks has been well-documented (Erdős & Rényi, 1960; Watts & Strogatz, 1998; Newman, 2010). Therefore is it possible that a small change in a given dynamic network property to have significant impact on processes operating on the network. Further research is necessary to understand the impact of variability in network properties has for predicting intervention impacts on social systems.

Additional methodological work is needed to evaluation for additional network statistics. The CCM has been expanded to bipartite networks (Goyal & De Gruttola, 2017); it may be possible to apply similar approaches to extend the DCCM to include bipartite networks. Further work is also required to develop dynamic essential network properties whose functions do not depend only on the previous observed network features, as making a Markov assumption can have significant impact on epidemics models (Goyal et al., 2012). Nonetheless, as shown in section 6, the proposed method provides greater flexibly than many existing network models in that it does not require the probability distribution of the dynamic essential network properties to conform to the Markov assumption.

Acknowledgments

This research is supported by grants from the National Institutes of Health (R37 AI-51164). Conflict of Interest: None declared.

8 Appendix: Technical Details for DCCM

The predicted networks were generated using a Metropolis-Hastings algorithm with target distribution based on equation (23). Use of Metropolis-Hastings algorithm requires evaluation of the acceptance probability, as described in equations (7). Since (22) provides the probability mass function for , we only need to calculate .

Though our analysis considers mixing based on only political party membership, the equations below are generalized to allow for mixing between individuals based on an arbitrary number of covariate patterns. We present the quantities for the four cases that must be evaluated in order to calculate . Let edge be the required edge toggle to move from to and let . The four cases are associated with whether exists in or or both or neither.

Case 1: and . Therefore,

(28)

and

(29)

Toggling any edge in would satisfy equations (28) and (29); since this logic holds for any and is constant across ,

(30)

Case 2: and . Therefore,

(31)

and