Generating Similar Graphs From Spherical Features

by   Dalton Lunga, et al.
Purdue University

We propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches that compute features of graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our approach bears similarity to a popular exponential random graph model (ERGM), unlike ERGMs, it does not suffer from degeneracy, a situation when a significant probability mass is placed on unrealistic graphs. We propose a parameter estimation approach for our model, and a procedure for drawing samples from the distribution. We evaluate the performance of our approach both on the small domain of all 8-node graphs as well as larger real-world social networks.



page 15


GraphDCA – a Framework for Node Distribution Comparison in Real and Synthetic Graphs

We argue that when comparing two graphs, the distribution of node struct...

Exponential random graph model parameter estimation for very large directed networks

Exponential random graph models (ERGMs) are widely used for modeling soc...

A note on choosability with defect 1 of graphs on surfaces

This note proves that every graph of Euler genus μ is 2 + √(3μ + 3) --c...

Statistical Models for Degree Distributions of Networks

We define and study the statistical models in exponential family form wh...

Personalized PageRank dimensionality and algorithmic implications

Many systems, including the Internet, social networks, and the power gri...

The Multiple Random Dot Product Graph Model

Data in the form of graphs, or networks, arise naturally in a number of ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Increasingly, many domains produce data sets containing relationships that are conveniently represented by networks, e.g., systems sciences (the Internet), bioinformatics (protein interactions), social domains (social networks). As researchers in these areas are developing models and tools to analyze the properties of networks, they are hampered by few samples available to evaluate their approaches. This gives rise to a problem of generating more network samples that can be viewed as drawn from the same population as the given network.

While there are a number of possible approaches to this problem, perhaps the most well-studied model is the exponential random graph models (ERGMs, or in social network literature, ), an exponential family class of models, matching the statistics over the set of possible networks to the statistics of the network in question (e.g., WassermanPattison, ). This and similar approaches have a long history as they generalize the HollandLeinhardt1981 and Markov random graph FrankStrauss models first developed in the social network literature. While such approaches are intuitive and have nice properties, they also suffer from the issue of degeneracy (Handcock2003b, ; Rinaldo2009, ), which is manifested in the instability of parameter estimation, and in placing most probability mass of resulting distributions on unrealistic graphs (e.g., empty graph or complete graphs). As a result, these approaches are not suitable for the purpose of generating graphs similar to the given one.

This paper contains two contributions. First, we zero-in on the issue of degeneracy and discover that its cause is related to the geometry of the set of feature vectors and the number of graphs mapped to each feature vector (thought of as feature vector weights). By augmenting feature vectors with logarithms of their corresponding weights, we show that only graphs with such augmented feature vectors on the relative boundary of the resulting extended convex hull can become modes of any exponential random graph model, explaining why unrealistic graphs (which are on the relative boundary) often get large probability masses. Second, using the insight of the observation above, we propose a novel random graph model which is based on embedding the features of graphs onto a surface of a hypersphere. Since a spherical surface is a relative boundary of the sphere’s convex hull, all of the feature vectors would then belong to the relative boundary of the convex hull and could potentially serve as modes of the corresponding distributions. This in turn helps to avoid the degeneracy issues which plague ERGMs.

Our proposed approach makes use of spherical features obtained by embedding possible graphs onto a surface of a sphere (Wilson2010, ), and then approximating the distribution of the resulting spherical feature space with a von Mises-Fisher distribution. Since the space of all possible graphs is too large to consider for embedding, we consider determining the embedding function based only on the neighborhood around the given graph thus resulting in a locally spherical embedding of the set of graphs. The main benefit of our approach is that it fixes the issue of degeneracy, with the mode of the distribution over the spherical feature vector coinciding with the features of the given graph. An additional advantage of our approach is that its parameter estimation procedure does not require cumbersome maximum entropy approaches used with ERGMs.

We start by revisiting the ERGM model and presenting insights on why this model often fails to generate realistic graphs (Section 2). We then propose our alternative approach, exponential locally spherical random graph model (ELSRGM, Section 3) and evaluate it on both synthetic and realistic graphs while comparing it to ERGMs (Section 4). We conclude the paper (Section 5) with ideas for future work.


Overall, we are interested in probabilistic approaches for generating graphs similar to the given one. We consider the case of simple (unweighted, no self-loops) undirected graphs .111Our approach extends to directed graphs as well. Where is the number of vertices; is a symmetric binary adjacency matrix with zeros on its diagonal, with iff there is an edge from to , and otherwise, where and . There are possible labeled graphs with vertices, a finite but often prohibitively large number even for fairly small .

We first consider the well-studied exponential random graph model (ERGM, also known as ) as a starting point for our approach.

2.1 ERGM Definition

In the area of social network analysis, scientists are often interested in specific features of networks, and some of the state-of-the-art models explicitly use them to define functions of network sub-structures. We will denote the vector of these functions by . Among the examples of such features used by social scientists, we have the number of edges, the number of triangles, the number of -stars, etc.:


The subgraph patterns corresponding to features in equation (1) are shown in Figure 1.

Figure 1: Simple typical subgraph configurations for undirected graphs. From left to right: edge, triangle, two-star and three-star configurations.

A commonly used probabilistic model over is an exponential family model that uses the expectation of as a vector of sufficient statistics. The distribution over can be parameterized in the form


where is the vector of natural model parameters, is a vector of features of , and

is the partition function.

Let denote the set of possible feature-vector values. Under ERGMs, the space of possible graphs is coarsened into a much smaller set of possible features. The distribution in (2) can also be viewed as a distribution over with all of the graphs mapped to the same value of assigned the same probability mass. Let be the number of graphs corresponding to a feature value . (We will refer to ’s as weights.) The distribution can also be extended to the set ,


also an exponential family distribution. We let denote a convex hull of . Since is finite, so would be , and would be a polytope in . Using a well-known result from the theory of exponential families Barndorff-Nielsen1978 , if is a vector of the sufficient statistics (e.g., for a single example graph , ), a maximum likelihood estimate (MLE) satisfies


and it exists, and the corresponding distribution is unique as long as , the relative interior of the convex hull of the set of possible feature vectors . In essence, ERGMs are designed to preserve mean features of the observed graph, a very intuitive and often desirable property. However, it is also well-known that the estimation of is an extremely cumbersome task, complicated by the fact that the exact calculation of is intractable, and approximate approaches (e.g., pseudo-likelihood, MCMC) are employed instead Hunter2008a .

2.2 Degeneracy in ERGMs

ERGMs often suffer from the problem of degeneracy Handcock2003b . There are two types of degeneracy usually considered. The first type occurs when the MLE estimate either does not exist, or the MLE estimation procedure does not converge due to numerical instabilities Handcock2003b ; Rinaldo2009 .

The second type of degeneracy happens when

can be reliably estimated, but the resulting probability distribution places significant probability mass (or virtually

all of probability mass) on unrealistic graphs, e.g., empty or complete graphs. This type of degeneracy can be considered from another viewpoint; that is the mode of ERGM corresponding to the may be placed on very different from . This is an undesirable property as there is little justification for placing large probability mass over a region away from observed feature vectors while placing little mass over the observed example.

For an illustration, consider the set (the set of all possible simple undirected graphs with nodes) with feature vectors . In this case, Figure 2 displays (left plot) the support space consists of -(edge,triangle) pair-statistics for all possible -node undirected graphs. The right plot of Figure 2, displays a diameter distribution highlighting that graphs with different topologies map to same edge-triangular feature pairs. The diameter was computed by observing that each feature-pair could be viewed as a cell of graphs with same feature count. This can be accomplished by first computing a perturbation graph whose nodes are all non-isomorphic graphs of -nodes. A graph edit distance is applied on the set containing -non isomorphic graphs (computed using nauty McKay1981 ). To compute the diameter for each cell given the perturbation graph, one only has to identify the graphs mapping to that cell and extract the maximum number of edges for any pair of graphs with their feature counts mapping to that cell. Table 1 shows the complexity of small sized graph spaces.

n number of edges all graphs are non-isomorphic
7 21 2,097,152 1,044
8 28 268,435,456 12,346
9 36 68,719,476,736 274,668
10 45 35,184,372,088,832 12,005,168
11 55 3,602,879,701,896,397 1,018,997,864
12 66 7,378,697,629,483,821,000 165,091,172,592
Table 1: Complexity of graph spaces for fixed number of nodes.
Figure 2: left: Distribution of -node graphs. right: Distribution of cell diameter; where we consider each feature-pair as a cell and compute the graph edit distance between each pair of graphs with feature counts mapping to that cell (feature-pair).

Figure 4 shows the support for the distribution over feature-vectors (all circles), and its convex hull (boundary in green). Only a small subset of the example feature pairs result in distributions over with modes coinciding the example feature vector (red discs with black borders). ERGMs estimated from other feature vectors will thus generate graphs with different features from the example graph as shown in Figure 3.

Figure 3: A degenerate ERGM specified by edge-triangle pair for an -node graph(top figure). Colorcoded is the pmf over the edge-triangle space for an estimated MLE . ”+” indicates the observed feature and its mean, while the ”” shape indicates the ERGM mode.

It is this behavior that we are most concerned with and are trying to address in this paper, and from now on when we mention degeneracy, unless otherwise noted, we mean the second type of degeneracy.

Figure 4: Solid points show mode placement on the edge-triangle feature pairs that form the vertices of the extended hull as a result of Theorem 1. Overlayed on solid points are modes obtained from using the estimated of equation (3).
Figure 5: View of mode placement on facets of the extended 3D convex hull. Each mode forms a vertex of a given triangle (denoted by black lines) for a facet of the convex hull.

2.2.1 Approaches to Fixing Degeneracy in ERGMs

Recent approaches in literature have focused on a more flexible specification of the model in equation (3). This has been achieved by a mixed set of feature statistics which includes the node attributes of a given graph Handcock2008 . Instead of using only the structural properties (edges, triangles etc.) of the graph, other attributes (e.g., gender,race,age,etc) are included as covariates for the model. This approach, however, does not address degeneracy in general as one has to know what set of features and attributes to choose in attempting to minimize degeneracy Handcock2008 . Such a task demands domain knowledge to accurately specify a reasonable model. Another approach makes use of the geometrically weighted edgewise shared partner, the geometrically weighted dyadic shared partner, and the geometrically weighted degree network statistics as new statistics for the model in (2). These specifications when parameterized have been shown to lead to curved exponential families Hunter2008a . However, the difficulty with this approach is not only in the parameter estimation of the resulting curved exponential model, but also on having to avoid other graph features that may be dependent to these specifications as that will result in degeneracy Hunter2006 .

2.3 Why Are Unrealistic Graphs Likely under ERGMs?

It may not be entirely surprising that the probability mass over support does not concentrate around the expected value of as the mode and the mean of the exponential family distributions do not always match. This does not however imply that the mode is placed on features corresponding to unrealistic graphs. To investigate the degeneracy further, we first introduce the following result from convex analysis.

Theorem 1.

Suppose is a full-dimensional bounded convex polytope with a finite set of vertices . Then for any ,

where by we denote the relative boundary of the convex hull .


Since is a convex hull of , for all


For any , let , and let . Then ,

Then is a supporting hyperspace, and is a supporting

dimensional hyperplane for

. Since is a full-dimensional convex polytope, , and therefore, is a proper supporting hyperplane. Thus (e.g., Brondsted83 , Thm. 4.1). Observing that completes the proof. ∎

We now consider the formulation in (3), from which after taking the log-likelihood we observe that


We extend to include the weight . Let
be the extended set of features, with being the resulting extended convex hull for and the set of vertices for . Then

By Theorem 1, only the points on the boundary can maximize (6). Thus to find the set of possible modes of (3), we can restrict our attention only to

Consider an illustration in Figure 5 for the case of all -node graphs. Only points on the boundary of the polytope can potentially be modes of any ERGM specified on the feature set in Figure 4 (points on the boundary are denoted by solid black discs and solid red discs with black borders on both plots). Figure 4 further reveals that only a small number of points on the relative boundary of correspond to actual observed modes for equation 3 (ERGMs with parameters from (4)).222The solid red discs in Figure 4 handles the case where modes of the maximum likelihood distribution are not unique. There are two possible reasons for this occurrence. One, some of the points in correspond to with (where , is a scalar augmenting the parameter vector), so not all of may be the modes of (3). Two, the MLE solution may lie outside of the cone Rinaldo2009

of parameters for which is the mode.333Often, feature vectors corresponding to the empty and to the complete graphs have particularly large cones, with many values falling within these cones, where is the feature maximizing (6). This explains why many ERGMs place large probability masses on these degenerate graphs.

The above observations also explain why considering a curved exponential model Hunter2006

does not rectify the issue of degeneracy as curving the space of parameters does not change the geometry of the feature space.444We observed the same behavior in separate experiments on other types of features spaces e.g. Edge-vs-2Star, Edge-vs-GWED, Triangle-vs-GWESP (GWED: geometrically weighted degree distribution, GWESP: geometrically weighted edgewise shared partners Hunter2008a .)

Finally, we note that this type of degeneracy is not restricted to distributions over statistics of finite graphs. Similar issues can in general arise with exponential family distributions on a bounded support owing to the fact that the exponential family distributions are designed to match the mean statistics and not to concentrate the probability mass around the mode. Consider an illustration in Figure 6 for a grid of uniformly spaced points; fitted is the exponential family model . As evident from the plots, the estimated model exhibit the degeneracy issue described above.

Figure 6: An illustration of degeneracy of exponential models on non-graph data. Colorcoded is the pmf over 2D grid spaces for estimated MLEs (left plot) (right plot) . ”+” sign indicates the observed feature and its mean, while the ”” shape indicates ERGM mode.

3 Exponential Locally Spherical Random Graph Model

The main result of section 2 provides us with a very important insight, that is; type II degeneracy in is due to the bounded nature of discrete exponential models. The formulation of such models is sensitive to the geometry of the support space . The geometry of the convex hull on defines which points are likely to have most or all of the probability mass placed on them. If then a model computed for will place very little probability mass on , while most of the mass is placed on some point (i.e. mode) . This result suggests that mapping all points onto a surface that belongs to its relative boundary and defining a model in the new space would solve the degeneracy issue. In the following sections, we propose mapping all observed features onto a spherical surface since every point on it belongs to . We then define a distribution and sampling techniques for graphs over the resulting feature space.

3.1 Algorithm

0:  Given an example graph and a feature vector function :
1:  Generate a neighborhood by performing a random walk in starting at .
2:  Compute the set of statistics .
3:  Compute a mapping to embed onto , a surface of a sphere in .
4:  Estimate the parameters for the von Mises-Fisher density over the space of hyper-spherical features .
5:  Approximate with a density , a mixture of kernels centered around the spherical features corresponding to graphs in and recompute when new sample graphs are discovered.
Algorithm 1 Outline of ELSRGM Procedure

In this section we describe the new model for graph sampling, one that can be used to generate non-degenerate graphs similar to the given one. Since the approach we are proposing is based on an exponential family model over the locally spherical embeddings of graph features, we will call it an Exponential Locally Spherical Random Graph Model, or ELSRGM for short. Our approach is executed in several steps as outlined in Algorithm 1.

First, we sample the neighborhood around the given example graph , and then compute the set of feature vectors corresponding to the graphs in the neighborhood. If the space of graphs ( or its subset) is small, then it may be possible to consider all graphs in the set. Otherwise, the neighborhood of is sampled by a random walk which at each step considers graph one edge deletion/insertion away from the current graph.

The resulting feature set is then embedded in a dimensional unit hypersphere by a linear mapping . We are using the spherical embedding approach of Wilson2010 , Algorithm 2), in which the mapping is chosen to minimize the Frobenius matrix distance between the normalized dissimilarity matrix (in our case, matrix of Euclidean distances between feature vectors in ) and the matrix of the Euclidean outer product (in ) for the vectors in . We denote the set of resulting spherical features by .555We refer to the spherically embedded graph feature vectors as coordinates of the graph. One of the beneficial properties of such embedding is that it preserves neighborhood properties, i.e., transformed feature vectors close to each other are mapped to vectors which are also close to each other. For our case, the embedding is locally spherical as we determine the mapping based only on a small subset of possible observed graphs, and determining the spherical coordinates for the rest by recomputing the embedding iteratively based on the uncovering of new candidate graphs for the neighborhood . As a result, the distance preserving property may hold only for the graph neighborhood on which the transformation was estimated.

0:  Given dissimilarity matrix , with number of graphs.
1:  If the spherical point positions are given by , then with
2:  If in unknown, compute for such that where and . Find the radius of sphere as . Where

is the smallest eigenvalue of

3:  Set and
4:  Decompose , Set the embedding positional matrix to be where is chosen such that the elements of corresponds to the largest eigenvalues of .
Algorithm 2 Outline of Spherical Embedding

The next step is to estimate a distribution over all possible values in of the spherical features. We propose to use von Mises-Fisher directional distribution (denoted in the rest of the paper as VMFD) with a pdf


where is the location parameter, is the concentration parameter, and denotes the modified Bessel function of the first kind and order . VMFD is a member of the exponential family; unlike a general exponential family distribution, it is symmetric with serving as both its mean and mode. Parameter determines how concentrated the density is around the mode. When

, VMFD corresponds to the uniform distribution over the hyper-sphere

, while as , VMFD is concentrated at the point . See Mardia2000 for more details on VMFD. As we wish the example graph to be the mode for the distribution over possible graphs, we set the mode of VMFD to . Using MLE, one can compute for from with already set location parameter i.e. . Given only a single example graph, the concentration parameter is undefined and also, can be observed to be a function of the random-walk-based sampling procedure, which is arbitrary. An alternative is to treat as a user-defined parameter controlling how concentrated the region around the example graph is. As a result, one obtains a distribution with density


over possible spherical feature vectors .

We note that one can employ a different family of distributions other than VMDF for the task of modeling spherical data. Other candidates provide more degrees of freedom, but also more difficult parameter estimation methods

Mardia2000 . We leave the investigation of this aspect of our algorithm for future work.

3.2 Sampling under ELSRGM

Using the ideas and steps from the previous subsection, we estimate a probability density function over a unit sphere on the domain of possible spherical feature values for graphs. Before proceeding further, however, we need to resolve several issues. First, we are interested in a discrete probability distribution over a very large but

finite set of possible features . How would one obtain such a distribution from the VMFD density? Second, what portion of the mass of is to be associated with each of the coordinates for possible graphs (or )? Third, it is infeasible to enumerate all of the possible graphs in , and it may be infeasible to consider all possible spherical feature vectors. How can one perform sampling in so that the resulting spherical features are distributed according to ? The following subsections will outline our approach to resolving these issues.

3.2.1 Density as an Approximation to the Smoothed Distribution

As a first step, consider a setting where all possible spherical feature vectors can be enumerated for a fixed number of nodes . We propose to approximate VMFD using a mixture of kernels with one mixture component for each graph’s spherical coordinates. (Alternatively, one can consider one mixture component for each possible spherical feature vector in .) Intuitively, we consider the density to be a baseline approximation to the distribution obtained by smoothing a discrete distribution over spherical feature values . More formally, for a set of graphs , we will index its elements with , and denote the spherical feature vector for graph by .

Assume is the estimated density in Equation (8). Let


with , where is the location for kernel , and all of the kernels share a user defined concentration (bandwidth) chosen to assign more weight to points closer to . The parameters , , are probabilities associated with each .

The estimation of the ’s is carried out by optimizing with respect to the Kullback-Leibler (KL) information:


where is the Lagrange multiplier. It can be shown that the objective function in (10) is convex, and it can be minimized efficiently using standard convex optimization techniques.

3.2.2 Metropolis-Hastings Algorithm

The goal is to draw samples from according to the probability mass , where . If the number of nodes is small (), random graphs can be enumerated using nauty (McKay1981 ) and it is possible to identify all possible feature values to compute probability masses associated with each graph. Graphs can then be sampled directly according to the resulting multinomial distribution. When all possible graphs of -nodes (for ) are observed, an alternative is to make use of the Metropolis-Hastings approach as outlined in Algorithm 3. However, in practice, the number of nodes is usually too large to explicitly consider all possible graphs, and the initial neighborhood would include only a small portion of all graphs. We propose an approach that will allow us to draw samples from the distribution over graphs including the graphs in .

For better understanding, we first propose a graph generation approach assuming is small, and all graphs

can be enumerated. We will draw a point in the embedding space, and then employ Markov Chain Monte Carlo (MCMC) approach to draw a graph “corresponding” to this point by constructing a Markov chain in the space

of graphs. The pseudocode is presented in Algorithm 3. If we could draw samples directly from , this procedure would be equivalent to drawing a vector , and then trying to identify out which of the components was used to generate by using MCMC in the posterior over . The only approximation employed in this case is that instead of drawing , we are drawing , but according to the KL-divergence measure (Figure 8), and are very close.

0:   is given.
1:  Draw a sample . Set , and perturb the example graph to generate a random graph to initialize the chain.
2:  repeat
3:     .
4:     Sample a proposal graph from the proposal distribution at time , .
5:     Compute ratio, where
7:  until convergence of the chain
Algorithm 3 MCMC Algorithm for Sampling Graphs

To sample from the von Mises-Fisher distribution we follow the approach outlined in Wood1994 . For proposal distribution we consider a uniform distribution over graphs one edge insertion/deletion away from . In our experiments on small-and large-sized graphs, with the number of iterations set to , the algorithm is observed to converge and does produce graphs with topologies that resemble the observed graph.

For larger graphs, however, we cannot compute explicitly for all graphs. This situation can be thought of as similar to the case of countably infinite number of objects in which a Dirichlet process can be employed to assign some weight to graphs that are not yet observed. It is as if, is approximated by a countably infinite mixture , i.e., assuming that the number of graphs is countably infinite instead of just very large. In this case, we are employing Dirichlet process mixture model , where is a concentration parameter and is a uniform distribution for the kernel location since assume that each yet unobserved set of features is equally likely.

Assuming that different graphs have been observed, equation 9 then becomes

Algorithm 4 details the pseudocode for sampling of small to large graphs on a spherical space. For basic information on Dirichlet process mixtures and their inference see Neal00 .

0:   is given. Graphs initially observed. Set . Estimate by minimizing (10).
1:  Draw a sample . Set , and perturb666perform edge random edge insertion/deletion atmost twice the example graph to generate a random graph to initialize the chain.
2:  repeat
3:     .
4:     Sample a proposal graph from the proposal distribution at time , .
5:     Compute ratio,
6:     where
7:     Set
8:     If is new, then set , .
  • Compute , set .

  • Re-compute (to generalize to new samples) and re-estimate .

9:  until convergence of the chain
Algorithm 4 MCMC-DP Algorithm for Sampling Graphs

4 Experimental Evaluation

We adopt the common goodness of fit measures in network generation studies to investigate how well our model fits the observed data (i.e. by comparing the observed statistics with a range of the same statistics obtained from simulating many networks using the fitted model) Hunter2008a . If the observed network is not typical of the simulated networks for a particular measure then the model is either degenerate or simply a misfit. We first consider the degree distribution, which is defined as the statistics: , with each representing the number of nodes with edges connected to them, divided by . Secondly, we compute the edgewise shared partner distribution, which is defined as the statistics: , with each representing the number of edges in the graph between two nodes that share exactly neighbors in common, divided by the total number of edges. Thirdly, we compute the triad census distribution, which is the proportion of sets having or edges among them. Lastly, we compute the minimum geodesic distance distribution; which is the proportion of pairs of nodes whose shortest connecting path is of length , for

. Also, pairs of nodes that are not connected are classified as


In all our experiments, we define each feature vector to be,

These features have been observed to be among the set of subgraph patterns that capture social interactions and network formation in real processes FrankStrauss .

4.1 Small graphs

We first consider the experiments for with nodes. Figure 8 (left) shows a two component -node synthetic graph, , while Figure 8 displays the -divergence computed by first observing -node graphs (graph-edit distance of neighborhood around ), and allowing the MCMC-DP (Algorithm 4) to discover and fill-in the neighborhood set , with the baseline centered at the coordinates of . was chosen to test our hypothesis for the extended feature space, that is if then the corresponding will be degenerate in relation to the analysis of Theorem 1. is used to test our second hypothesis, i.e. if then the ERGM specified by is most likely to generate realistic graphs. In Figure 8 (right) we show an -node synthetic graph whose extended feature vector ().

For the experimental set-up, the bandwidth for each kernel is treated as a user defined parameter and is set to , while the DP prior parameter is set to . Parameters for the are set as discussed in section 3, with fixed at to control the concentration of . We compute the parameters for the ERGM model and simulations using the statnet package Handcock2008 .

Figure 9, depicts simulated summary results for . The result obtained from our ELSRGM (in unshaded blue box plots), displays no sign of degeneracy while the result shows summary statistics that appear to relate to those of empty and complete graphs- a sign of degeneracy. Figure 10, shows summary statistics of simulated networks obtained for the and the ELSRGM models when . The results show no sign of degeneracy from both the specified and the proposed ELSRGM . This confirms our second hypothesis, that is having extended feature vectors lie on the relative boundary of the convex hull enables the generation of realistic graphs from exponential family models.

Figure 7: Two 8-node synthetic test graphs. Left: 2-component graph falling inside the relative interior of the convex hull of extended features. Right: falling on the relative boundary of the convex hull for the extended feature space.
Figure 8: Sample -divergence for ELSRGM as more graphs get uncovered. The initial neighborhood is built around .
Figure 7: Two 8-node synthetic test graphs. Left: 2-component graph falling inside the relative interior of the convex hull of extended features. Right: falling on the relative boundary of the convex hull for the extended feature space.
(a) (b)
Figure 9: (a) Synthetic graph ; (b) Corresponding feature-pair: ; (c) Simulations from models given . The observed statistics are indicated by the solid lines; the box plots include the median and interquartile range of simulated networks.

show low variance and all

samples seem to be placed on the same network- sign of degeneracy.
(a) (b)
Figure 10: (a) Synthetic graph ; (b) Corresponding feature-pair: ; (c) Simulations from models given . The observed statistics are indicated by the solid lines; the box plots include the median and interquartile range of simulated networks. Both models are non-degenerate.

4.2 Larger Graphs

For larger data sets, we first consider a undirected Dolphin social network with nodes DuBois2008 ; Lusseau2003 . We apply the MCMCDP approach as outlined in Algorithm 4, by sampling graphs via a random walk for steps starting from a random network . Figure 11 summarizes the results of simulations for the Dolphin network from the two approaches. The proposed ELSRGM shows a relatively better performance of capturing the distribution of statistics of the Dolphin network. It is appears that ERGM model specified by simple statistics is incapable of generating the distribution of statistics that resembles those of the observed network. The lack of fit by an ERGM model in the degree distribution, geodesic distance, and shared partners distribution indicates the presence of degeneracy.

Figure 11: Dolphin -node network summaries. The observed statistics are indicated by the solid lines; the box plots include the median and interquartile range of simulated networks. plots shows signs of degeneracy effect.

We next evaluate ELSRGM on -node Faux-Mesa-High social network Handcock2008 ; Resnick1997 . We again apply the MCMC-DP algorithm by sampling graphs via a random walk for steps starting from a random network . Figure 12 depicts the results of simulated networks from the specified ELSRGM and models. Again, we observe that ELSRGM generates distributions of statistics that resemble those of the observed network while ERGMs shows signs of misspecification.

Figure 12: Faux-Mesa-High -node network summaries. The observed statistics are indicated by the solid lines; box plots summarize the statistics for the simulated networks. plots shows signs of degeneracy effect.

Finally, we assess ELSRGM’s performance on a benchmark data set from social network analysis, which is used in testing whether a model can overcome the degeneracy phenomenon. We consider the first matrix ( nodes) of the Kapferer’s tailor shop data Kapferer1972 . The result of simulated network summaries shown in Figure 13 suggests that the proposed ELSRGM does not suffer from degeneracy. ERGM results were not included for this data set due to severe unstable estimation of MLE using simple specifications considered in our analysis (corresponding to the first type of degeneracy).

Figure 13: Kapferer’s -node network summaries. The observed statistics are indicated by the solid lines; box plots summarize the statistics for the simulated networks using ELSRGM.
Figure 14: left: Faux-Mesa-High -node original network. right: ELSRGM generated network.

5 Conclusion and Future Work

In this paper, we investigated the cause of degeneracy in ERGMs, and explained the degeneracy as related to the feature vectors belonging to the relative interior of the convex polytope of feature values. To correct this issue, we proposed an algorithm that uses spherical features as points on a sphere would belong to the relative boundary of the corresponding convex set rather than its relative interior. Based on this mapping, we outlined a novel model for graphs (ELSRGM) and an approach to sampling graphs from this model. In several synthetic and real-world social network, our approach generated graphs with statistics similar to that of the example graphs while not suffering from the issue of degeneracy (unlike ERGMs).

ELSRGM opens up a new class of graph sampling models: those based on spherical features. In this paper, we made several modeling choices, e.g., von Mises-Fisher distribution for spherical density, kernel approach assigning mass to individual feature vectors; other modeling choices could also lead to models preserving properties of the example graphs, and they need to be investigated. The insight from the geometric interpretation of the feature vectors realizable as modes may also lead to other types of features (non-spherical) which can lead to models generating realistic graphs.

6 Acknowledgements

The authors thank Okan Ersoy, S.V.N. Vishwanathan, and Richard C. Wilson for helpful discussions and suggestions. This research was supported by the NSF Award IIS-0916686.


  • [1] O. Barndorff-Nielsen. Information and exponential families in statistical theory. Wiley, New York, 1978.
  • [2] A. Brøndsted. An Introduction to Convex Polytopes. Springer-Verlag, 1983.
  • [3] C. L. DuBois and P. Smyth. UCI network data repository, 2008.
  • [4] O. Frank and D. Strauss. Markov graphs. JASA, 81(395), September 1986.
  • [5] M. S. Handcock. Assessing degeneracy in statistical models of social networks. Technical Report 39, Center for Statistics and the Social Sciences, University of Washington, 2003.
  • [6] M. S. Handcock, D. R. Hunter, C. T. Butts, S. M. Goodreau, and M. Morris. statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software, 24(3), 2008.
  • [7] P. W. Holland and S. Leinhardt. An exponential family of probability distributions for directed graphs (with discussion). Journal of American Statistical Association, 76(373), 1981.
  • [8] D. R. Hunter and M. Handcock. Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15(2), 2006.
  • [9] D. R. Hunter, M. S. Handcock, C. T. Butts, S. M. Goodreau, and M. Morris. ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software, 24(3), 2008.
  • [10] B. Kapferer. Strategy and transaction in an African factory. Manchester: Manchester University Press., 1972. available from
  • [11] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54, 2003.
  • [12] K. V. Mardia and P. E. Jupp. Directional Statistics. Wiley, 2000.
  • [13] B. D. Mckay. Practical graph isomorphism. Congressus Numerantium, 30, 1981.
  • [14] R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Staitistics, 9(2):249–265, June 2000.
  • [15] M. D. Resnick and coauthors.

    Protecting adolescents from harm. findings from the national longitudinal study on adolescent health.

    Journal of American Medical Association, 278(8), 1997.
  • [16] A. Rinaldo, S. E. Fienberg, and Y. Zhou. On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3:446–484, 2009.
  • [17] S. Wasserman and P. Pattison. Logit models and logistic regression for social networks: An introduction to Markov graphs and model. Psychometrii, 61(3), September 1996.
  • [18] R. C. Wilson, E. R. Hancock, E. Pekalska, and R. P. W. Duin. Spherical embeddings for non-Euclidean dissimilarities. In CVPR-10, pages 1903–1910, June 2010.
  • [19] A. T. Wood. Simulation of the von Mises-Fisher distribution. Communications in Statistics - Simulation and Computation, 23(1), 1994.