1 Introduction and background
Exponential random graph models (ERGMs) form a flexible and powerful family of statistical models for network data, used in variety of fields and especially in the social scenes; see Robins et al. (2007) and other papers within the same special issue. These models are of exponential family form with proposed sufficient statistics that can range from the number of edges of the network Erdös and Rényi (1959) to the number of -stars, or other graphical features of networks; see for example Holland and Leinhardt (1981); Frank and Strauss (1986).
While ERGMs can be specified using any network statistics, it has been long known that node degrees and statistics thereof have great expressive power in representing and modeling networks, perhaps more than most network statistics. See, e.g., Newman (2003) and Handcock and Morris (2007). Some of the recent literature on ERGMs have focussed on the properties of the beta model, for which the degree sequence of a network is the sufficient statistic and postulates independent edges; see Blitzstein and Diaconis (2010), Chatterjee et al. (2011), and Rinaldo et al. (2013). In this paper we consider instead the less studied class of ERGMs whose sufficient statistics are derived from the joint degree distribution of the nodes, for which the assumption of dyadic independence no longer holds.
This work is motivated in great part by the desire to describe the statistical properties of a class of network statistics known as K-graphs, originally proposed by Mahadevan et al. (2006). K-graphs were originally formulated as a means to capture increasingly refined properties of networks in a hierarchical manner based on higher order interactions among node degrees (see, e.g., Dimitropoulos et al. (2009)). Despite the significant appeal of K graphs as summary statistics of networks and their ability to capture stochastic dependencies among noes (see, in particular, Shioda et al. (2011)), to the best of our knowledge, the statistical properties of such statistics have not been investigated. The purpose of this paper is to uninitiate such study and to offer some preliminary but non-trivial results that highlight the complexity and modeling power of these statistics. Like it is often the case with ERGMs relying on network statistics that are not based on dyadic independence, the theoretical analysis is particularly challenging.
This paper contains the following contributions. We formally define ERGMs based on K sequences of different dimensions. For the K and K models, we derive conditions for the existence of the MLE of the model parameter and, therefore, for their estimability. We consider the general case of i.i.d observations, which in particular includes the more interesting and common case of observed network. We are concerned with the asymptotic behavior of the above model and compare it specifically to the behavior of the dense Erdös-Rényi model. We show that the model is in fact radically different from the Erdös-Rényi model, an appealing feature that many ERGMs do not always possess, as demonstrated by Chatterjee and Diaconis (2013). Finally, we define the ERGM with bi-degree distribution and provide ideas and conjectures for further work for such models.
Basic definitions and concepts.
We denote the set of simple, undirected, labeled graphs with nodes by . If in an iid sample there is at least one observation of each members of
, a natural estimation of the probability of observingis the ratio of the number of observations of to the total number of observations. However, in most practical cases, there are considerably fewer observations than this, and in the most common case, there is only one observation available.
For the above purpose, we propose a family of ERGMs in order to extract as much information as possible on the (joint) degrees of nodes of the few available observations. More precisely, the sufficient statistics of a model in this family are scaled forms of the number of induced connected subgraphs with a specific number of nodes with the same degree sequence. We call such subgraphs configurations. For example, for one-node configurations of the graph with nodes, that is, nodes of , the sufficient statistics are the number of nodes with degrees , denoted by ; and for two-node configurations of , that is, edges of , the sufficient statistics are the number of edges with degrees , denoted by , whose components are indexed lexicographically. The three-node configurations of are triangles (i.e., complete subgraphs with three nodes) and -stars (i.e., subgraphs with three nodes, in which a node is connected to two non-adjacent nodes) of , and so on. For example, in the graph in Figure 1, and .
In general, the vectorof length can be defined in the same fashion for connected subgraphs with nodes (i.e. configurations), and it can be represented in a normalized form, scaled by the number of all -node configurations in the graph. For example, since , instead of , one can use , which is generally called the degree distribution of . In higher orders, is called the joint degree distribution of , and in particular the normalized is called the bi-degree distribution of .
Notice also that is another way of expressing, and hence contains no more information than, the degree partition of the unlabelled graph , that is, , where each represents the degree of a node in , usually ordered in a way such that . The degree sequence can also be generalized for higher orders to give the joint degree partition of a graph.
The K-graph models.
The set of graphs constrained by the joint degree distribution is called the K-graphs, where
is the same as above, standing for the number of nodes of the configuration. The joint distribution itself is sometimes called theK-distribution. As a convention, the K-distribution is the average degree, that is, the ratio of the number of edges to the number of nodes in the graph. Thus can be defined as , the number of edges of . The corresponding model is equivalent to the Erdös-Rényi model after reparameterizing in the
K model with its odd.
For the above reason, we refer to the proposed statistical models in exponential family from , with sufficient statistics as the K models. These models assign the same probability of occurrence to all graphs with the same joint degree distributions of corresponding order.
These models are also hierarchical in the sense that from , all , , are uniquely determined, whereas the converse is not true. For example, the following observation provides a method for obtaining from :
and the following observations provide a method for obtaining from :
Hence by moving from a higher order model to a lower one, one loses some information that can be inferred from the data. This can be seen as the fact that, by moving towards a lower order, there would be a larger sets of graphs with the same sufficient statistics. Lower order models, on the other hand, enable us to infer the structure of a larger set of possible graphs based on the more local characteristics of the observed graphs. Bespeaking, for lower order models, more parameters can be estimated when there are no observations of some joint degree distributions at all.
However, in practice, there are usually very few observations, and in many cases only one observation available; hence it is normally more practical to work with the K or K models. This is the direction we take in this paper, and hence we only focus on the K and K models.
Structure of the paper.
In the next section, we study the K model, discussing the calculation of the normalizing constant, the existence and calculation of maximum likelihood estimation, as well as the asymptotic behavior of the MLE. In Section 3 we apply a parallel theory in order to define the model, and we present some ideas and conjectures regarding the maximum likelihood estimation for further work.
2 The K model
The exponential family form.
As mentioned before, our goal is to model the probability of observing a network with nodes. Denote the probability of a node having degree by . Considering the ordered vector of possible degrees in , the probability of observing can be written as
where is the number of nodes with degree in , is a normalizing constant, and .
This in turn can be parametrized in exponential family form as
where and .
We reduced the dimension of the sufficient statistics by arbitrarily removing the element , to obtain as sufficient statistics. We see that this model assigns the same probability to graphs with the same degree distribution, which is the only information the model collects from the data.
Calculating the normalizing constant.
We know that , where is the set of all non-isomorphic graphs with nodes. We use this to calculate the normalizing constant for a fixed . In this case the normalizing constant can be calculated directly from the set of all graphs with nodes. Denote all these graphs by , where .
The normalizing constant for the K model can be written as
where there are terms corresponding to each , and the first and the last terms correspond to the complete and the null graphs.
Notice that in (6) there are repeated terms for different labeling of isomorphic graphs as well as non-isomorphic graphs with the same degree distribution. We illustrate the proposition with an example; see also McKay (1985) for asymptotic estimates of the number of graphs in with a prescribed degree distribution.
There are non-isomorphic graphs with nodes, where and are repeated three times for different labeling:
Heuristically, the model should be regarded as relatively unaffected by degeneracy issues, for at least two reasons. First, the change statistic (see Snijders et al., 2006) corresponding to adding an edge between two nodes of degrees and is , suggesting in general a lack of significant correlation with the number of edges in the graph. Secondly, the growing (in ) number of parameters and the fact that the degree distributions are negatively correlated prevent the distribution from concentrating on very few configurations as it pushes the probability mass to spread out.
Existence of the maximum likelihood estimator.
Suppose that , are iid observations of the networks with nodes. The log-likelihood function can be written as
We define the average observed sufficient statistics for the K model to be for iid observations . It is known that, for distributions in exponential families, the MLE exists if and only if the average observed sufficient statistics lie on the interior of the model polytope; see Bardorff-Nielsen (1978) and Brown (1986).
We first explore the corresponding model polytope to derive a necessary and sufficient condition for the existence of the maximum likelihood estimator (MLE) of , that is, . We then exploit the well-studied theory of exponential family to calculate the MLE when existing.
In order to study , we need the two following notations and lemmas: Denote the -regular graphs with nodes by . In addition, by denote the graphs with nodes such that and .
For , there exists an if and only if is even.
Suppose that and are odd numbers and . There exists an if and only if is even.
Suppose that is even. We know that there is a -regular graph with nodes. For this graph, consider a matching . Now and an isolated node provides . By removing the edge between and and connecting to both and , we obtain . By this method, we can inductively generate all .
If is odd, the sum of degrees of nodes would be an odd number, which is impossible. ∎
Let . In addition, for , let , where is the st element of the vector of length , and for and , let , where is the st element of the vector and is the st element of the vector of length . We observer that and . The following lemma characterizes ; see also
The model polytope is
- (if is even)
, that is, the -simplex scaled by ; and
- (if is odd)
the convex hall of the set of extreme points .
implies that . We now need to deal with the two cases separately:
1) even: By lemma 1, we know that for each , there exists an . The corresponding vectors are all the vertices of ; therefore, .
2) odd: For and even, by lemma 1, there exist , and the corresponding are the extreme points. For , odd and even , by Lemma 2, there exist , and it is easy to see that the corresponding cannot be generated by the convex combination of other . Therefore, these are the extreme points too. Now only the vectors with integer entries fewer then that lie on but not on the convex combination of extreme points contain an element as an entry, which, again by Lemma 1, it is not possible. ∎
Let be the vector of size consisting only of zero elements. We can now characterize :
The model polytope is
- (if is even)
the convex hall of the set of extreme points ; and
- (if is odd)
the convex hall of the set of extreme points .
By projecting the polytope , given in Lemma 3, onto the -dimensional Euclidean space with coordinates , we obtain the result. ∎
In Figure 3, the polytopes and , for graphs with three nodes, are depicted.
Therefore, we have the following theorem:
For the K model and iid observations , the MLE exists if and only if
- (if is even)
for all , , and ; and
- (if is odd)
(i) for all , , and ; and (ii) for every odd , .
The MLE exists if and only if .
In the case that is even, a point in is written as , where for , , and . Therefore, if and only if and , which implies the result.
In the case that is odd, a point in is written as , where for even , , for odd and even , , , and .
If , for every odd , , which implies . In addition, we know that , which implies that . For every even , , which implies . Now by using , we obtain .
Conversely, if conditions (i) and (ii) hold, we let and . We conclude that , , and , which imply the result. ∎
For a single observation in the K model, the MLE does not exist.
For an observed , if , then and vice vera. implies . Therefore, in either case the MLE does not exist. ∎
By partially differentiating the log-likelihood function in (8), we conclude that the MLE should satisfy the following system of equations:
Therefore, from (7), the MLE for K model with observations must satisfy the following system of equations:
Notice that if for a fixed , then . We observe that is the only parameter that appears in all terms and does not appear in all terms of the denominator of the left hand side of (10). Therefore, . This corresponds to , or equivalently since it is that assumed .
Thus, based on the MLE, the model implies that the probability of a node having a specific degree is zero when no node with that degree has been observed. Hence, it is plausible to remove such parameters from the model and simply focus on the submodel of exponential family form. The corresponding model polytope would then be simply the same, embedded in the Euclidian space with remaining coordinates. By this method we can still estimate a subset of parameters whose corresponding sufficient statistic is nonzero.
Notice that even with a single observation, there are sometimes a considerable number of parameters that can be estimated. The following proposition deals with the extreme case of such graphs.
There exists a graph of every size such that and for .
We prove the result by induction on the number of nodes. For the base, where , the result holds for . Suppose that there exists such a graph . We prove it for : Since there is a such that and if . We construct the graph as follows. We add a node to and start connecting it to the nodes starting from the node with the node with degree and stop when we reach . Now regardless of what degree the added node has, there are nodes of all degrees except zero in graph. ∎
Notice, however, that the large sum in (10) entails the common problem with ERG modeling that the computing of the MLE for this model ultimately requires MCMC methods, just like with most other ERGMs.
We have shown that we can estimate a subset of parameters whose corresponding sufficient statistic is nonzero. Here we calculate the probability of a sufficient statistic to be nonzero, and discuss its behaviour asymptotically. In addition, we provide the asymptotical expected value of the number of non-zero sufficient statistics.
Under the K-model, for each , we have the following:
We will now consider the very special case where all are equal, which implies that they are all equal to . In this case we see that the model is the same as the Erdös-Rényi model with .
Suppose that for all and let , where is a sequence such that for all . It then holds that
, if .
By using (11), we have that , which is the same as the same probability in the Erdös-Rényi model with . Let . From the theorem 3.1 in Bollobas (2001), we know that if , then , and if , then . We show that the latter holds for , and the former if . By using Stirling’s approximation, .
If the previous display is and hence tends to infinity. If the term will vanish instead. The claim is proved. ∎
We now consider the number of non-zero entries in the degree distribution and show that it is of order , which corresponds to the number of estimable parameters in the model.
Suppose that for all and let denote the number of entries in the degree distribution that are non-zero. Then,
Let for some which will be set below. Assume that and let denote the event that there exists a node with degree less than or more than . Let denote the degree of node and notice that , for all . Then .
By the union bound and Hoeffding inequality we get that, for any , , provided that . It is now easy to see that the expected number of non-zero entries in the degree distribution is of order . In details, ∎
We have seen that the case for all and some corresponds to the Erdös-Rényi model with . Using (5), this is equivalent to having , for all . We now consider the slightly more general case in which the vector belongs to a subset of such that , where and , for all . Then it is easy to see that, for any ,
where is given in the proof or proposition 4 and denotes a probability of the random graph when sampled from the 1K model with parameter . Next, for a sequence such that , let . Then, by proposition 4, (12) yields that, for , is bounded by a term that is asymptotically of order .
Now let . If , then from the proof of proposition 4, which yields a trivial bound. Thus assume that since for all . Then, provided that , the probability vanishes. (Notice that this implies that .)
Relation with the Erdös-Rényi model.
Recently and somewhat surprisingly, Chatterjee and Diaconis (2013) have shown that some specifications of the ERGM family lead to models which asymptotically behave like Erdös-Rényi (ER) models for appropriate choices of .
Here we provide some results illustrating the relationships between the ER model and the 1K model. Below we parametrize the 1K model using the natural parameter vector . Notice that we have expunged from the vector of sufficient statistics, a choice that entails no loss of generality.
First off, we make the easy observation that the ER model with probability can be represented by setting , where . Next, we show that the 1K model is dramatically different from the Erdös-Rényi model in the sense of being almost singular to it whenever the natural parameters are uniformly bounded in absolute value.
Below, we will denote with
the probability distribution of the Erdös-Rényi model with parameterand with the probability distribution of the 1K model with natural parameter vector .
Let such that , for a fixed, arbitrarily small . Then, there exists a sequence of subsets of such that, for any sequence such that for all and ,
Let be the set of graphs such that , for all , where denote the degree of the th node of graph and is a positive constant to be specified.
By Heoffding’s inequality, for any , there exists a such that , for large enough that