1 Introduction
In recent years there has been an increasing interest in the study of probabilistic models defined on graphs. One of the main reasons for this interest is the flexibility of these models, making them suitable for the description of real datasets, for instance in social networks (Newman et al. , 2002)
and neural networks
(Cerqueira et al. , 2017). The interactions in a real network can be represented through a graph, where the interacting objects are represented by the vertices of the graph and the interactions between these objects are identified with the edges of the graph.In this context, the Exponential Random Graph Model (ERGM) has been vastly explored to model real networks (Robins et al. , 2007)
. Inspired on the Gibbs distributions of Statistical Physics, one of the reasons of the popularity of ERGM can be attributed to the specific definition of the probability distribution that takes into account appealing properties of the graphs such as number of edges, triangles, stars, etc.
In terms of statistical inference, learning the parameters of a ERGM is not a simple task. Classical inferential methods such as Monte Carlo maximum likelihood (Geyer & Thompson, 1992) or pseudolikelihood (Strauss & Ikeda, 1990)
have been proposed, but exact calculation of the estimators is computationally infeasible except for small graphs. In the general case, estimation is made by means of a Markov Chain Monte Carlo (MCMC) procedure
(Snijders, 2002). It consists in building a Markov chain whose stationary distribution is the target law. The classical approach is to simulate a “long enough” random path, so as to get an approximate sample. But this procedure suffers from a “burnin” time that is difficult to estimate in practice and can be exponentially long, as shown in Bhamidi et al. (2011). To overcome this difficulty, Butts (2015) proposed a novel algorithm called bound sampler. The simulation time of the bound sampler is fixed and depends only on the number of vertices of the graph. Despite being approximate methods for sampling from the ERGM, the bound sampler is recommended when the mixing time of the chain is long (Butts, 2015).In order to get a sample of the stationary distribution of the chain, an alternative approach introduced by Propp & Wilson (1996) is called Coupling From The Past (CFTP). This method yields a value distributed under the exact target distribution, and is thus called perfect (also known as exact) simulation. Like MCMC, it does not require computing explicitly the normalising constant; but it does not require any burnin phase either, nor any detection of convergence to the target distribution.
In this work we present a perfect simulation algorithm for the ERGM. In our approach, we adapt a version of the CFTP algorithm, which was originally developed for spin systems (Propp & Wilson, 1996) to construct a perfect simulation algorithm using a Markov chain based on the Glauber dynamics. For a specific family of distributions based on simple statistics we prove that they satisfy the monotonicity property and then the CFTP algorithm is computationally efficient to generate a sample from the target distribution. Finally, and most importantly, we prove that the mean running time of the perfect simulation algorithm is at most as large (up to logarithmic terms) than the mixing time of the Markov chain. As a consequence, we argue CFTP is superior to MCMC for the simulation of the ERGM.
2 Definitions
2.1 Exponential random graph model
A graph is a pair , where is a finite set of vertices and is a set of edges that connect pairs of vertices. In this work we consider undirected graphs; i.e, graphs for which the set satisfies the following properties:

For all ,

If , then .
Let be the collection of all undirected graphs with set of vertices V. A graph can be identified with a symmetric binary matrix such that if and only if .
We will thus denote by x a graph, which is completely determined by the values , for . The empty graph is given by , for , and the the complete graph is given by , for .
We write to represent the set of values . For a given graph x and a set of vertices , we denote by the subgraph induced by , that is is a graph with set of vertices and such that for all .
In the ERGM model, the probability of selecting a graph in is defined by its structure: its number of edges, of triangles, etc.
To define subgraphs counts in a graph, let be the set of all possible permutations of distinct elements of . For , define as the subgraph of induced by . For a graph with vertices, we say contains (and we write ) if implies . Then, for and , , we define the number of subgraphs in by the counter
(1) 
Let be fixed graphs, where has vertices, and
a vector of parameters. By convention we set
as the graph with only two vertices and one edge. Following Chatterjee et al. (2013) and Bhamidi et al. (2011) we define the probability of graph by(2) 
Observe that the set is equipped with a partial order given by
(3) 
Moreover, this partial ordering has a maximal element (the complete graph) and a minimal element (the empty graph). Considering this partial order, we say is monotone if the conditional distribution of given is a monotone increasing function. That is, is monotone if, and only if implies
(4) 
Let be the set of possible edges in a graph with set of vertices V. For any graph , a pair of vertices and we define the modified graph given by
Then, Inequality (4) is equivalent to
(5) 
Moreover, if this inequality holds for all then the distribution is monotone.
Monotonicity plays a very important role in the development of efficient perfect simulation algorithms. The following proposition gives a sufficient condition on the parameter vector under which the corresponding ERGM distribution is monotone.
Proposition 1.
Consider the ERGM given by (2). If for all then the distribution is monotone.
3 Coupling From The Past
In this section we recall the construction of a Markov chain with stationary distribution using a local update algorithm called Glauber dynamics. This construction and the monotonicity property given by Proposition 1 are the keys to develop a perfect simulation algorithms for the ERGM.
3.1 Glauber dynamics
In the Glauber dynamics the probability of a transition from graph x to graph is given by
(6) 
and any other (nonlocal) transition is forbidden.
For the ERGM, this can be written
(7) 
and
It can be seen easilty that the Markov chain with the Glauber dynamics (7) has an invariant distribution equal to . It can be implemented with the help of the local update function defined as
(8) 
A transition in the Markov chain from graph x to graph y is obtained by the random function
For random vectors and , with and , we define the random map given by the following induction:
The CFTP protocol of (Propp & Wilson, 1996) relies on the following elementary observation. If, for some time
and for some uniformly distributed
and , the mapping is constant, that is if it takes the same value at time on all possible graphs x, then this constant value is readily seen to be a sample from the invariant measure of the Markov chain. Moreover, if is constant for some positive , then it is also constant for all larger values of . Thus, it suffices to find a value of large enough so as to obtain a constant map, and to return the constant value of this map. For example, one may try some arbitrary value , check if the map is constant, try otherwise, and so on…Since the state space has the huge size , it would be computationally intractable to compute the value of for all in order to determine if they coincide.
This is where the monotonicity property helps: it is sufficient to inspect the maximal and minimal elements of . If they both yield the same value, then all other initial states will also coincide with them, and the mapping will be constant.
To apply the monotone CFTP algorithm, we use the partial order defined by (3) and the extremal elements and . The resulting procedure, which simulates the ERGM using the CFTP protocol, is described in Algorithm 1.
Let
be the stopping time of Algorithm 1. Proposition 2 below guarantees that the law of the output of Algorithm 1 is the target distribution . The proof of Proposition 2 follows directly from Theorem 2 in Propp & Wilson (1996) and is omitted here.
Proposition 2.
Suppose that . Then the graph returned by Algorithm 1 has law .
4 Convergence Speed: Why CFTP is Better
By the construction of the perfect simulation algorithm it is expected that the stopping time is related to the mixing time of the chain. In Theorem 1 below we provide an upper bound on this quantity.
Given vectors and define the forward random map as
(9) 
For , let the Markov chain be given by
and denote by the distribution of this Markov chain at time . Now define
where stands for the total variation distance.
Let be given by
(10) 
The following theorem shows that the expected value of is upper bounded by , up to a explicit multiplicative constant that depends on the number of vertices of the graph.
The proof of this relation is inspired from an argument given in Propp & Wilson (1996), and is presented in Section 6. The upper bound in Theorem 1 could be also combined with other results on the mixing time of the chain to obtain an estimate of the expected value of , when they are available. As an example, we cite the results obtained by
Bhamidi et al. (2011), where they present a study of the mixing time of the Markov chain constructed using the Glauber dynamics for ERGM as we consider here.
They show that for models where belongs to the high temperature regime the mixing time of the chain is . On the other hand, for models under the low temperature regime, the mixing time is exponentially slow.
Observe that the mixing time is directly related to the “burnin” time in MCMC, where the nonstationary forward Markov chain defined by (4) approaches the invariant distribution. This can be observed in practice in Figure 1: the convergence time for both MCMC and CFTP are of the same order of magnitude, as suggested by Theorem 1. In this figure we compare some statistics of the graphs obtained by MCMC at different sample sizes of the forward Markov chain and by CFTP at its convergence time. The model is defined by (2), where is chosen as the graph with 3 vertices and 2 edges, with parameters , and the mean proportion of edges in the graphs is approximately . We observe in Figure 0(a) that MCMC remains a long time strongly dependent on the initial value of the chain when this value is chosen in a region of low probability. On the other hand, when the initial value is chosen appropriately, then MCMC and CFTP have comparable performances, as shown in Figure 0(b).
5 Discussion
In this paper we proposed a perfect simulation algorithm for the Exponential Random Graph Model, based on the Coupling From The Past method of Propp & Wilson (1996). The justification of the correctness of the CFTP algorithm is generally restricted to the monotonicity property of the dynamics, given by Proposition 1, and to prove that is almostsurely finite, but with no clue about the expected time to convergence. Here, in contrast, we prove a much stronger result: not only do we upperbound the expectation of the waiting time, but we show that CFTP compares favorably with standard MCMC forward simulation: the waiting time is (up to a logarithmic factor) not larger than the time one has to wait before a run of the chain approaches the stationary distribution. Moreover, CFTP has the advantage of providing a draw from the exact stationary distribution, in contrast to MCMC which may keep for a long time a strong dependence on the initial state of the chain, as shown in the simulation example of Figure 1. We thus argue that for these models the CFTP algorithm is a better tool than forward MCMC simulation in order to get a sample from the Exponential Random Graph Model.
6 Proofs
Proof of Proposition 1.
Let x, such that . For a pair of vertices , we define a new count that considers only the subgraphs of that contain the vertices and , that is
(12) 
In the same way, we have that . So,
(13) 
Since we have that . For , , we have that
(14) 
Finally,
and this concludes the proof. ∎
Proof of Theorem 1.
Define the stopping time of the forward algorithm by
To prove the theorem claim we use the random variable
, since and have the same probability distribution. In fact,(15) 
Let and be the Markov chains obtained by (4) with initial states given by and , respectively. Define as the length of the longest increasing chain such that the top element is . In the case we have . Otherwise, if we have that has at least one different edge of , since our algorithm use a local update. Then, . So, we have that
(16) 
Since the update function of the transitions of the Markov chain given by (8) updates only one edge at each step, we have that and is the length of the longest increasing chain started on with top element is , that is . Thus
Since is submultiplicative (Theorem 6 in Propp & Wilson (1996)) we have that
(17) 
Then, (16) and (17) imply that
(18) 
Set . Using the submultiplicative property of (Lemma 4.12 in levin2009markov) we have that
(19) 
and by (18) and (19) we conclude that
Acknowledgments
A.C. is supported by a FAPESP scholarship (2015/125954). F.L. is partially supported by a CNPq’ fellowship (309964/20164). This work was produced as part of the activities of FAPESP Research, Innovation and Dissemination Center for Neuromathematics, grant 2013/076990, and FAPESP’s project Structure selection for stochastic processes in high dimensions, grant 2016/173940, São Paulo Research Foundation.
References
 Bhamidi et al. (2011) Bhamidi, Shankar, Bresler, Guy, & Sly, Allan. 2011. Mixing time of exponential random graphs. The Annals of Applied Probability, 21(6), 2146–2170.
 Butts (2015) Butts, Carter T. 2015. A Novel Simulation Method for Binary Discrete Exponential Families, With Application to Social Networks. The Journal of mathematical sociology, 39(3), 174–202.
 Cerqueira et al. (2017) Cerqueira, Andressa, Fraiman, Daniel, Vargas, Claudia D, & Leonardi, Florencia. 2017. A test of hypotheses for random graph distributions built from EEG data. IEEE Transactions on Network Science and Engineering.
 Chatterjee et al. (2013) Chatterjee, Sourav, Diaconis, Persi, et al. . 2013. Estimating and understanding exponential random graph models. The Annals of Statistics, 41(5), 2428–2461.
 Geyer & Thompson (1992) Geyer, Charles J, & Thompson, Elizabeth A. 1992. Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society. Series B (Methodological), 657–699.
 Newman et al. (2002) Newman, Mark EJ, Watts, Duncan J, & Strogatz, Steven H. 2002. Random graph models of social networks. Proceedings of the National Academy of Sciences, 99(suppl 1), 2566–2572.
 Propp & Wilson (1996) Propp, James Gary, & Wilson, David Bruce. 1996. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random structures and Algorithms, 9(12), 223–252.
 Robins et al. (2007) Robins, Garry, Pattison, Pip, Kalish, Yuval, & Lusher, Dean. 2007. An introduction to exponential random graph (p*) models for social networks. Social networks, 29(2), 173–191.
 Snijders (2002) Snijders, Tom AB. 2002. Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 1–40.
 Strauss & Ikeda (1990) Strauss, David, & Ikeda, Michael. 1990. Pseudolikelihood estimation for social networks. Journal of the American Statistical Association, 85(409), 204–212.