Investigation of Patient-sharing Networks Using a Bayesian Network Model Selection Approach for Congruence Class Models

01/20/2020 ∙ by Ravi Goyal, et al. ∙ Mathematica Policy Research Harvard University 0

A Bayesian approach to conduct network model selection is presented for a general class of network models referred to as the congruence class models (CCMs). CCMs form a broad class that includes as special cases several common network models, such as the Erdős-Rényi-Gilbert model, stochastic block model and many exponential random graph models. Due to the range of models able to be specified as a CCM, investigators are better able to select a model consistent with generative mechanisms associated with the observed network compared to current approaches. In addition, the approach allows for incorporation of prior information. We utilize the proposed Bayesian network model selection approach for CCMs to investigate several mechanisms that may be responsible for the structure of patient-sharing networks, which are associated with the cost and quality of medical care. We found evidence in support of heterogeneity in sociality but not selective mixing by provider type nor degree.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There is a growing body of research that leverages administrative claims data to identify connections among medical providers; two providers are deemed to have a connection when both treat the same patient as indicated by medical claims. Such connections have been shown to imply clinical relationships among providers.barnett2011mapping; dugoff2018geographic The collection of such connections has been referred to as patient-sharing networks.landon2012variation; dugoff2018scoping Investigating the relationship between such networks and patient health outcomes is an emerging area of research; see DuGoff et al. (2018) for a systematic review of the literature.dugoff2018scoping The most common theoretical explanation for this association is that the networks represent aspects of collaboration, continuity, and care coordination;barnett2011mapping; pollack2015patient these aspects may be especially important for patients with multiple chronic conditions, who account for a high percentage of health care costs.bodenheimer2009care Health outcomes that have been studied in this regard include cost,barnett2012physician; landon2018patient; uddin2013mapping; uddin2016exploring; uddin2013study; uddin2011effect; uddin2015impact; hussain2015collaboration; pollack2013patient; agha2018team utilization,pollack2013patient; pollack2014s patient-reported outcomes,carson2016characterizing; carson2016outcome quality of care,dugoff2018geographic; hollingsworth2016association and mortality.hussain2015collaboration; hollingsworth2016association Even as such evidence grows, there remain gaps regarding how to make use of knowledge about the patient-sharing networks to improve health care. In particular, neither how patient-sharing networks evolve in response to different incentives nor how to use them to develop or evaluate interventions is well understood.dugoff2018scoping Addressing these gaps requires knowledge of the generative mechanisms governing the evolution of the patient-sharing network.

In the manuscript, we present an approach that facilitates evaluation of the evidence supporting hypotheses that observed networks were generated by specified mechanisms. We apply this approach to evaluate the presence of heterogeneity in sociality and selective mixing in patient-sharing networks; details about these mechanisms are provided in Section 3. The approach requires 3 steps: 1) development of network models consistent with potential generative mechanisms, 2) evaluation of the evidence supporting the proposition that a given network model generated the observed network, and 3) selection of the model with the greatest evidence as measured by their posterior probabilities. The posterior probability estimation is based on a novel approach to Bayesian model selection among a class of models–denoted as congruence class models (CCMs) for networks.

goyal2014sampling CCMs form a broad class that includes as special cases several common network models, such as the Erdős-Rényi-Gilbert (ER) model and stochastic block (SB) model as well as many exponential random graph models (ERGMs)–one of the most commonly used network models in social network analysis. See Goldenberg et al. for a review of these models.goldenberg2010survey

The ability of CCMs to generalize such a broad set of network models arises from the flexibility in specifying the probability distribution associated with network properties included in a model. This flexibility enables investigators to develop network models that more closely correspond to potential network generation mechanisms than do current classes of network models. In this regard, it allows investigators to make better use of the totality of their knowledge of the generative mechanism.

Bayesian network model selection has shown to be challenging because of the computational burden of estimating the likelihood–and therefore the posterior probability–for each candidate network model.caimo2013bayesian The likelihood of the CCM requires the computation of the number of graphs associated with the observed values of network properties, this problem is referred to as graph enumeration. In this manuscript, we make use of a recently developed general recursive formula to estimate the number of labeled graphs for given values of graph properties that makes the computational burden feasible for mid-size networks, i.e. several thousand nodes.goyalgraphenumeration

The next section provides background details on CCMs and their relationship to other network models (section 2.1), Bayesian model selection (section 2.2), the general graph enumeration recursive formula (section 2.3), and relevant research on network model selection (section 2.4). Section 3 introduces patient-sharing networks and a description of several competing network models representing distinct generative mechanisms. The evaluation of the models and results of the model selection approach is presented in Section 4. The paper concludes with a discussion and further research.

2 Background

2.1 Congruence Class Models for Networks

We denote a network as , where and are the vertex and edge sets of . Let represent the number of vertices in and denote the size of set ; therefore, . We represent a network as an adjacency matrix. Let indicate that there is an edge between and , where , whereas indicates that there is no edge.

Let denote the space of all potential networks with vertices. Let denote an algebraic map from to network summary statistics (e.g., degree distribution or degree mixing) and let denote the inverse image associated with . We refer to these inverse images of singleton sets as congruence classes;goyal2014sampling they also have been referred to as fibers in algebraic statistics literature.petrovic2017survey We use to represent the mapping as well as the associated network property being calculated. Let denote the number of graphs for which network property equals ; this quantity has been referred to as a volume factor.Shalizi13

The probability distribution on for the CCM is specified by the probability mass function (PMF) on the congruence classes defined by ; we denote this PMF as . denotes the total probability of all networks that are elements in given , i.e.,


where is the probability of graph . CCMs assume that two networks within a congruence class have the same probability of being observed; this assumption is also present in common network models including the ER model, SB model, and ERGM. Therefore, the probability distribution on for a CCM is the following:


CCMs are able to represent several common network models because of the flexibility in the specification of . To illustrate this flexibility, we consider as an example the specification of a probability distribution identical to that of the ER model. To do so, we let be the mapping from a network to its number of edges, i.e., , and let equal the following:


For ERGMs, networks in the same class have the same probability;petrovic2017survey hence there exists a such that a CCM and an ERGM assign the same probability distribution on . However, CCMs provide additional flexibility in modeling compared to ERGMs. ERGMs require to have a specific functional form, while the CCM does not place restrictions on this form.

2.2 Bayesian Model Selection

Bayesian model selection identifies the model with the highest posterior probability among a set of candidate models, which we denote as . This section derives the posterior probabilities for a set of potential CCMs.

Let be an indicator variable that is the correct model for the observed network. Equation 4 shows the posterior probability for model given an observed network :


where and

are the model evidence (shown below) and prior probability, respectively, for model

. Equation 5 shows the model evidence for model :


where are the parameters for model , is the likelihood, and is the prior distribution of the parameters of model .

The model evidence for a CCM can be derived by substituting the CCM likelihood (Equation 2) into the general equation for model evidence (Equation 5) as shown in Equation 6:


The volume factor is not dependent on and, therefore can be brought outside of the integral as shown below in Equation 7:


This integral can be computed using standard approaches;davis2007methods however, our computation makes use of recent research to estimate the volume factor.goyalgraphenumeration In the next section, we present a summary of this work.

2.3 Graph Enumeration

Equation 8 provides a recursive formula to estimate the number of graphs, , with specific value(s), , for particular network properties, :


where is the ratio between the sizes of congruence class and , i.e.,


Methods for evaluation of the recursive formula, Equation 9, have been developed for a range of network properties of interest to social network analysis, including number of edges, mixing by nodal covariates, degree distribution, and degree mixing.goyalgraphenumeration

2.4 Previous Research on Network Model Selection

Compared to other areas of network analysis, there has been relatively little published research on model selection. For ERGMs, Caimo and Friel (2013) developed a Bayesian model selection method based on an extension of their reversible jump Markov chain Monte Carlo algorithm that estimates the posterior probability for each model.

Caimo11; caimo2013bayesian

Thiemichen et al. (2016) presented a method that applies a Laplace approximation to estimate the Bayes factor, which allows for model selection in the Bayesian paradigm.


However, these approaches to model selection for ERGMs suffer from a high complexity cost. In order to reduce the computational burden, Bouranis et al. (2017) and Bouranis et al. (2018) proposed alternative approach to Bayesian inference for ERGMs based on adjusting the pseudo-posterior distribution or pseudo-likelihood, respectively.

bouranis2017efficient; bouranis2018bayesian However, they note that procedures for approximating a solution to the likelihood equation are more challenging for larger datasets.bouranis2018bayesian

Even if the computational issues of Bayesian model selection for ERGMs are overcome, the ERGMS still have severe limitations in the types of generative mechanisms that they can model. The functional form of ERGMs only allows for investigators to specify the mean of the probability distribution for each network property. Such restriction results in the inability of ERGMs to model some common generative mechanisms.goyalMPMC

Yan et al. (2014) presents a frequentist approach to select between the stochastic block model and degree-corrected block model using a likelihood ratio test.yan2014model Their approach is limited to comparing these two nested models. Nonetheless, they do consider a setting with additional complexity in that they assume that the mapping from vertices to blocks is unknown and must be estimated. There has also been research in a closely related area of assessing model fit for networks.hunter2008goodness; schweinberger2012statistical; gross2017goodness

3 Patient-Sharing Networks

For each US state, we analyze the patient-sharing network in order to investigate the generative mechanisms underlying this network. State networks include all resident providers who share Medicare patients. The next section provides a description of the data used to generate the patient-sharing network for each state. The subsequent section provides details of generative mechanisms we investigate and their associated CCM.

3.1 Data

Our analysis uses two publicly available data sets from the Center for Medicare and Medicaid Services (CMS).CMS2015referral; CMS2015med The first identifies edges between providers; we define two providers as connected if a Medicare patient encounters both within a 30-day interval. CMS provides variations of this data sets for which the time interval between encounters can be set at 30-, 60-, 90-, or 180-days. We use the 30-day interval, as it is the most conservative for implying that two providers are coordinating care for a patient; this choice is intended to reduce the number of “spurious” edges that arise from providers treating distinct aliments and do not need to coordinate care.barnett2012physician; an2018analysis Our analysis assumes that providers are connected if they share at least one patient; previous studies have used this threshold as well as other thresholds.moen2016analysis; lee2011social Our analysis investigates networks that arose in the first three quarters of 2015–the most recent publicly available data set.

The second data set, called Medicare Provider Utilization and Payment Data for 2015, lists the geographic location and medical specialty of the provider. This information was used to filter our patient-sharing network based on the state the provider resides as well as label providers as either primary or specialty care.

3.2 Network Mechanisms and Models

We investigate several CCMs, denoted as -, that model mechanisms that may be responsible for the resultant patient-sharing network. In this section, we introduce the models, and in Section 4, present the results of selecting among the models. The first two models ( and ) are associated with mechanisms in which edges form at random; these are presented as null models and used to investigate the importance of prior information. Model is associated with heterogeneity in sociality, whereas models and are associated with two different types of selective mixing.

3.2.1 Null Model without and with Prior Information ( and )

The simplest mechanism we consider assumes that each pair of providers form a connection with a fixed probability, , that is independent of all other edges; networks are generated based on this assumption. Hence, the mechanism corresponds to the ER model, which is a commonly used as a null network model. We investigate two CCMs ( and ) that are both based on the this mechanism, but vary in their prior information; represents an ER model with no prior information, whereas represents the same mechanism as , except in that we include an informative prior based on patient-sharing networks from states other than the one of focus (we consider these analyses by state). Therefore for both and , and is equal to Equation 3. Assuming no prior information on , the model evidence for is shown below in Equation 10:


For , we assume that the prior information for

follows a beta distribution,

. Therefore, the model evidence for is shown below in Equation 11:


3.2.2 Sociality ()

Sociality is defined as the propensity for an individual to create connections;goodreau2009birds our goal is to evaluate whether there is heterogeneity among individuals in sociality. As in developing any generative model, there are two steps: The first is identifying the important covariates to include in the model. The second is modeling the relationship between the covariates and the outcome of interest by choosing the appropriate probability distribution for this relationship. In our setting, these steps consist of identifying network properties, e.g., degree distribution, and then associating these properties with a probability distribution that is consistent with beliefs regarding the mechanism underlying the generation of the network.

In any realization (whether observed or simulated) of a network, heterogeneity in sociality would be reflected in the degree distribution. To investigate this issue in the patient-sharing network, we consider a model () that includes degree distribution as the sole network property; this selection corresponds to the first step. As Goodreau et al. (2009) note, sociality is not synonymous with degree; the former is a feature of the network generating mechanisms, whereas the latter is a feature of any given realization of the mechanism.goodreau2009birds

For the second step, the choice of appropriate probability distribution associated with degree distribution depends on the specific mechanisms hypothesized to generate the network. For example, a common feature in social systems is the concentration of influence to a few individuals through mechanisms that encourage preferential attachment–the mechanism wherein providers form connections with others based on a probability proportional to the number of connections the providers already have (i.e., degree).price1976general; BA99

This type of mechanism generates networks with degrees following a fat-tailed distribution–specifically a power-law distribution. Many real-world networks can be modeled using this mechanism, which leads to a network wherein many nodes have a moderate number of edges and fewer nodes have a large number. Other mechanisms may result in different distributions. For example, mechanisms based on the non-equilibrium theory can result in an exponential distribution for the degrees.

deng2011exponential CCMs provide the flexibility that enables investigates to select the most appropriate probability distribution.

In order to specify , we introduce some notation. Let the degree of vertex , denoted as , be the number of edges between that vertex and others; hence, . Let

represent the vector of degrees of nodes in set

, commonly referred to as a degree sequence. The degree distribution, denoted as , is a vector such that the entry represents the number of vertices having degree , i.e., . Let be the mapping from a network to its degree distribution, i.e., .

Model represents networks generated such that the degrees follow an exponential distribution with scale parameter ; that is ; one could also investigate alternative models, such as where the degrees follow a power-law distribution. Again, we include prior information based on the patient-sharing networks from the states other than the one of focus. We assume the prior information for

follows a normal distribution. The model evidence for

is shown below in Equation 12:


3.2.3 Selective Mixing ( and )

The resultant patient-sharing network may also be influenced by the presence of mechanisms by which providers form connections based on one or more of their individual characteristics. Often mixing is assortative, but it can also be dissasortative; the former implies preferential formation of connections between individuals with similar characteristics, and the latter, between individuals with contrasting characteristics.goodreau2009birds Model investigates selective mixing by specialty, whereas model investigates mixing by the number of connections, i.e., degree of a provider.

For model , we consider mixing between primary and specialty care providers. Let be a matrix representing the mixing pattern of network , where the entry is the total number of edges between a vertex with characteristic and vertex with characteristic ; in our application, we are interested in the characteristic indicating the provider type (primary or specialty care). To investigate mixing by provider type in the patient-sharing network, we consider a model () that includes mixing matrix as the sole network property; this selection corresponds to the first step in developing a generative model. For the second step, we have model

include three independent variables representing the proportion of edges that link: 1) a primary care provider to another primary care primary, 2) primary care provider to a specialty provider, and 3) specialty provider to another specialty provider. Each of these variables follow a binomial distribution with parameters,

, and , respectively. We assume the prior information for these parameters, based on the patient-sharing networks from the states other than the one of focus, follow beta distributions, denoted as , , and . Therefore, the model evidence for is as follows:




is the entry , that is number of edges in between providers specified by ; and is number of possible edges between providers of type and .

Model represents that networks are generated based on the mechanism of selective mixing by degree. We evaluate the presence of this mechanism by modeling the degree mixing matrix, denoted as ; the entry is the total number of edges between vertices of degrees and . Let . As with the previous models, we need to select a probability distribution, , for the selected network property based on our hypothesized network generative mechanisms. For , we assume that the proportion of edges between nodes of degrees and () is based on the following logistic model:


We assume the prior information for , and follows a multivariate normal distribution; to minimize the effects of noise, our estimation of the prior only includes degrees up to . Therefore, the model evidence for , assuming that the entries of the degree mixing matrix follow a multinomial distribution, is as follows:


4 Investigation of Patient-Sharing Networks

In the sections below, we present findings on the value of prior information as well as whether there is evidence of sociality and selective mixing. We investigate these questions for each of the 50 states. However, we first present our findings for the state of Wyoming–chosen because a small number of providers reside in the state and, therefore, easier to visualize compared to other states.

4.1 Investigation of Wyoming

In 2015, the state of Wyoming had 1283 medical providers that share Medicare patients; we designated 412 and 871 as primary care and as specialty providers, respectively, based on their provider type in the Medicare Provider Utilization and Payment Data. In 2015, there were 12,749 connections among these providers based on shared patients. Figure 1 presents a visualization of the patient-sharing network for these providers. The nodes represent providers and colored based on whether they are designated as primary (blue) or specialty (red) care. The edges between the nodes represent that the providers have at least one shared patient; we denote the patient-sharing network for Wyoming as .

Figure 1: A visualization of the patient-sharing network for providers that share Medicare patients and reside in Wyoming. The nodes represent providers and colored based on whether they are designated as primary (blue) or specialty (red) care. The edges between the nodes represent that the providers have at least one shared patient.

4.1.1 Value of Prior information: Comparing and

The only difference between and is the inclusion of prior information based on data from the other 49 states. By comparing such models, one can gain insight into the importance of prior information on model selection. In order to evaluate the model evidence of and , we calculated the log size of congruence class , i.e., , as . The estimated model evidence for models and are and , respectively. This results in posterior probability estimates of and . Therefore, the inclusion of prior information alters the model evidence and poster probabilities; for the remaining comparisons, we use .

4.1.2 Evidence of Sociality

To investigate evidence for heterogeneity among individuals in their propensity to create connections, we assess if the observed degree distribution, shown in Figure 2, differs from the null model with prior information. This assessment is based on comparing models and as models and are not relevant for investigating heterogeneity of sociality. In order to evaluate the model evidence of , we estimated the log size of congruence class , i.e., , as based on the recursive algorithm described in section 2.4. The estimated model evidence for is . This results in estimated posterior probabilities of and when comparing only models and . Therefore, there is evidence of heterogeneity in sociality contributing to the patient-sharing network in Wyoming.

Figure 2: A visualization of the degree distribution for the patient-sharing network for providers that share Medicare patients and reside in Wyoming.

4.1.3 Evidence of Selective Mixing by Provider Type

To investigate whether there is evidence of selective mixing by provider type, we compare models and as the other models are not relevant for this assessment. In order to evaluate the model evidence for , we estimated the log size of congruence class , i.e., , as based on the recursive algorithm. The estimated model evidence for is . This results in posterior probability estimates of and . Therefore, there is little evidence for selective mixing by provider type for Wyoming. It is important to note that this conclusion does not assure that mixing by provider type is not a significant influence on the network structure; it only means that our model –as specified by our choice of –did not lead to a better model compared to our choice of null model.

4.1.4 Evidence of Selective Mixing by Degree

To assess evidence for selective mixing by degree, we must control for the distribution of degrees. Therefore, we compare models and . In order to evaluate the model evidence of , we estimated the log size of congruence class , i.e., , as . The estimated model evidence for is . This results in posterior probabilities of and . The posterior probability estimates for and suggest that there is strong evidence in favor of , the model which includes only degree distribution. Again, this conclusion does not imply that degree mixing is not a significant influence on the network structure; it only means that –as specified by our choice of –did not lead to a better model compared to our choice of model for degree distribution.

4.2 Findings Across all 50 Patient-sharing Networks

For 47 of the 50 states, the posterior probabilities for when compared to were above ; the only exceptions were California (), Delaware (), and Wyoming (). California has the lowest network density among all 50 states, whereas Delaware and Wyoming have the highest. Across all states we see strong evidence of heterogeneity in sociality. However, we see little evidence of selective mixing by provider type and degree; for computational reasons, we only investigated states for selective mixing by degree.

5 Discussion

Two factors that we describe above allow investigators to make best use of their prior knowledge about networks and of observed data: 1) The ability to select the probability distribution for network properties, which enables investigators to evaluate the appropriateness of different network models that correspond to mechanisms that they hypothesize lead to generation of an observed network, and 2) the ability to incorporate prior information into network analyses. We demonstrate these capabilities in our investigation of generative mechanisms associated with patient-sharing networks. In particular, we investigated heterogeneity in sociality as well as selective mixing by provider type and degree. To do so, we develop CCMs corresponding to these network properties. Using our Bayesian model selection approach, we found evidence in support of heterogeneity in sociality but not selective mixing.

There are several limitations to our analysis. First, our conclusion of whether selective mixing by provider type or degree is a significant influence on the network structure was inconclusive. This result stems from the fact that there are potentially many probability distributions (beyond the ones we selected) that can be used to model mixing patterns. Therefore, more work is required to develop probability distributions for network properties that correspond to distinct generative network mechanisms; this need is particularly great for degree mixing as it is a high dimensional property, i.e., contains a large number of entries. Second, our analysis did not account for heterogeneity in the number of patients treated by providers; developing methods on how to incorporate this information into analyses is another promising area of research. Third, we investigated patient-sharing networks based on the representation of connections between providers as being binary (present or absent); future research is needed to expand CCMs to account for weighted edges.

Data Availability Statement


This research is supported by a grant from the National Institute of Health (R37 AI-51164). Conflict of Interest: None declared.