The counting complexity class #P was introduced by Valiant [Val79] to capture the apparent intractability of the permanent. Although exactly evaluating #P-complete problems is a task even harder than solving NP-complete problems [Tod91], efficient approximation algorithms may still exist. This possibility was first exhibited through the relationship between approximate counting and sampling [JVV86]
, and in particular, using the Markov chain Monte Carlo (MCMC) method[Jer03]. Early successes along this line include efficient algorithms to approximate the volume of a convex body [DFK91], to approximate the partition function of ferromagnetic Ising models [JS93], and to approximate the permanent of a non-negative matrix [JSV04]. More recently, a number of exciting new approaches were developed, utilising a variety of tools such as correlation decay [Wei06, BG08], zeros of graph polynomials [Bar16, PR17], and Lovász local lemma [Moi17].
Partial rejection sampling [GJL17] is yet another alternative approach to approximate counting and exact sampling. It takes an algorithmic Lovász local lemma [MT10] perspective for Wilson’s cycle-popping algorithm [PW98] to sample rooted spanning trees. The algorithm is extremely simple. In order to sample from a product distribution conditional on avoiding a number of undesirable “bad” events, we randomly initialise all variables, and re-sample a subset of variables selected by a certain subroutine, until no bad event is present. In the so-called extremal instances, the resampled variables are just those involved in occurring bad events. Despite its simplicity, it can be applied to a number of situations that are seemingly unrelated to the local lemma. Unlike Markov chains, partial rejection sampling yields exact samplers. The most notable application is the first polynomial-time approximation algorithm for all-terminal network reliability [GJ18c].
In this paper we sharpen the run-time analysis for a number of algorithms under the partial rejection sampling framework for extremal instances. We apply our method to analyse cluster-, cycle-, and sink-popping algorithms. Denote by the number of vertices in a graph, and the number of edges (or arcs) in a undirected (or directed) graph. We summarise some background and our results below.
Cluster-popping is first proposed by Gorodezky and Pak [GP14] to approximate a network reliability measure, called reachability, in directed graphs. They conjecture that the algorithm runs in polynomial-time in expectation on bi-directed graphs, which is confirmed by Guo and Jerrum [GJ18c]. It has also been shown in [GJ18c] that a polynomial-time approximation algorithm for all-terminal network reliability can be obtained using cluster-popping.
We show a upper bound for the expected number of resampled variables on bi-directed graphs, where
is the maximum failure probability on edges. This improves the previousbound [GJ18c]. We also provide an efficient implementation based on Tarjan’s algorithm [Tar72] to obtain a faster algorithm to sample spanning connected subgraphs in time (assuming that is a constant). Furthermore, we obtain a faster approximation algorithm for the all-terminal network reliability, which takes time, improving upon from [GJ18c]. For precise statements, see Theorem 6 and Theorem 13.
Sampling sink-free orientations is introduced by Bubley and Dyer [BD97a] to show that the number of solutions to a special class of CNF formulas is #P-hard to count exactly, but can be counted in polynomial-time approximately. Bubley and Dyer showed that the natural Markov chain takes time to generate a distribution
-close (in total variation distance) to the uniform distribution over all sink-free orientations. Using the coupling from the past technique, Huber[Hub98] obtained a exact sampler. Cohn, Pemantle, and Propp [CPP02] introduced an alternative exact sampler, called sink-popping. They show that sink-popping resamples random variables in expectation.
We improve the expected number of resampled variables for sink-popping to at most . See Theorem 19.
In all three applications, we also construct examples to show that none of the constants (if explicitly stated) can be further improved. Our results yield best known running time for all problems mentioned above except sampling uniform spanning trees, for which the current best algorithm by Schild [Sch18] is in almost-linear time in . We refer interested readers to the references therein for the vast literature on this problem. One should note that the corresponding counting problem for spanning trees is tractable via Kirchhoff’s matrix-tree theorem, whereas for the other two exact counting is #P-hard [Jer81, PB83, BD97a]. It implies that their solution spaces are less structured, and renders sampling much more difficult than for spanning trees.
The starting point of our method is an exact formula [GJL17, Theorem 13] for the expected number of events resampled, for partial rejection sampling algorithm on extremal instances. Informally, it states that the expected number of resampled events equals to the ratio between the probability (under the product distribution) that exactly one bad event happens, and the probability that none of the bad events happens. This characterisation has played an important role in the confirmation of the conjecture by Gorodezky and Pak [GP14], which leads to the aforementioned all-terminal network reliability algorithm [GJ18c].
When bad events involve only constant number of variables, bounding this ratio is sufficient to obtain tight run-time bound. However, in all three popping algorithms mentioned above, bad events can involve as many variables as or , and simply applying a worst case bound (such as or ) yields loose run-time upper bound. Our improvement comes from a refined expression of the number of variables, rather than events, resampled in expectation. (See (2) and Theorem 3.) We then apply a combinatorial encoding idea, and design injective mappings to bound the refined expression. Similar mappings have been obtained before in [GJL17, GJ18c], but our new mappings are more complicated and carefully designed in order to achieve tight bounds. We note that our analysis is completely different and significantly simpler than the original analysis for cycle-popping [PW98] and sink-popping [CPP02].
Since our bounds are tight, one has to go beyond partial rejection sampling to further accelerate cluster- and sink-popping. It remains an interesting open question whether is a barrier to uniformly sample spanning connected subgraphs.
2. Partial rejection sampling
Let be a set of mutually independent random variables. Each can have its own distribution and range. Let be a set of “bad” events that depend on ’s. For example, for a constraint satisfaction problem (CSP) with variables () and constraints (), each is the set of unsatisfying assignments of for . Let be the set of variables that depends on.
Our goal is to sample from the product distribution of , conditional on none of the bad events occurring. Denote by the product distribution and by the conditional one (which is our target distribution).
A breakthrough result in algorithmic Lovász local lemma is the Moser and Tardos algorithm [MT10], which simply iteratively eliminates occurring bad events. The form we will use is described in Algorithm 1.
The resampling table is a very useful concept of analysing Algorithm 1, introduced by [MT10], and similar ideas were also used in the analysis of cycle-popping [PW98] and sink-popping [CPP02]. Instead of drawing random samples, we associate an infinite stack to each random variable. Construct an infinite table such that each row represents random variables, and each entry is a sample of the random variable. The execution of Algorithm 1 can be thought of (but not really implemented this way) as first drawing the whole resampling table, and whenever we need to sample a variable, we simply use the next value of the table. Once the resampling table is fixed, Algorithm 1 becomes completely deterministic.
We say a collection of bad events are extremal, if for any , either or .
In other words, if the collection is extremal, then any two bad events are either disjoint or depend on distinct sets of variables (and therefore independent). It was shown [GJL17, Theorem 8] that if Condition 1 holds, then Algorithm 1 indeed draws from (conditioned on halting). In particular, Condition 1 guarantees that the set of occurring bad events have disjoint sets of variables.
Condition 1 may seem rather restricted, but it turns out that many natural problems have an extremal formulation. Examples include the cycle-popping algorithm for uniform spanning trees [Wil96, PW98], the sink-popping algorithm for sink-free orientations [CPP02], and the cluster-popping algorithm for root-connected subgraphs [GP14, GJ18c]. In particular, the cluster-popping algorithm yields the first polynomial-time approximation algorithm for the all-terminal network reliability problem. More recently, another application is found to approximately count the number of bases in a bicircular matroid [GJ18a], which is known to be #P-hard to count exactly [GN06]. We refer interested readers to [GJL17, GJ18b] for further applications of the partial rejection sampling framework beyond extremal instances.
A particularly nice feature of partial rejection sampling algorithms for extremal instances is that we have an exact formula [GJL17, Theorem 13] for the expected number of events resampled. This formula makes the analysis of these algorithms much more tractable. However, it counts the number of resampled events. In the most interesting applications, bad events typically involve more than constantly many variables. Using the worst case bound of the number of variables involved in bad events yields loose bounds.
We give a sharper formula for the expected number of variables resampled next. Let be the number of resamplings of event . Let be the probability such that exactly occurs, and be the probability such that none of occurs. Suppose as otherwise the support of is empty. For extremal instances, [GJL17, Lemma 12] and the first part of the proof of [GJL17, Theorem 13] yield
The proof of (1
) is via manipulating the moment-generating function of. Let be the number of resampled variables.111This notation is different from that in [GJL17]. By linearity of expectation and (1),
We note that an upper bound similar to the right hand side of (2) was first shown by Kolipaka and Szegedy [KS11], in a much more general setting but counting the number of resampled events. We will use (2) to derive both upper and lower bounds.
In order to upper bound the ratio in (2), we will use a combinatorial encoding idea, namely to design an injective mapping. For an assignment , let be its weight so that . Let be the set of assignments so that exactly occurs, and . Note that are mutually exclusive and are in fact a partition of . Also, let
Moreover, let be the set of “perfect” assignments such that none of occurs.
For a constant and an auxiliary set , a mapping is -preserving if for any , any and ,
where is the restriction of on the first coordinate.
A straightforward consequence of (2) is the following theorem, which will be our main technical tool.
If there exists a -preserving injective mapping for some auxiliary set , then the expected number of resampled variables of Algorithm 1 is at most .
Note that and . As is -preserving and injective,
The theorem then follows from (2). ∎
Let be a directed graph with root . The graph is called root-connected if there is a directed path in from every non-root vertex to . Let be the failure probability for arc , and define the weight of a subgraph to be . Thus the target distribution is over all root-connected subgraphs, and .
The extremal formulation by Gorodezky and Pak [GP14] is the following. Each arc is associated with a (distinct) random variable which is present with probability . A cluster is a subset of vertices not containing so that no arc is going out. We want all vertices to be able to reach . Thus clusters are undesirable. However, Condition 1 is not satisfied if we simply let the bad events be clusters. Instead, we choose minimal clusters to be bad events.
More precisely, for each , we associate it with a bad event to indicate that is a cluster and no proper subset is one. Each depends on arcs going out of for being a cluster, as well as arcs inside for being minimal. In other words, . There are clearly exponentially many bad events. A description of the algorithm is given in Algorithm 2.
Any minimal cluster is strongly connected.
Since Condition 1 holds, Algorithm 2 draws from the desired distribution over root-connected subgraphs in a directed graph. It was further shown in [GJ18c] that the cluster-popping algorithm can be used to sample spanning connected subgraphs in an undirected graph, and to approximate all-terminal network reliability in expected polynomial time. We will first give an improved running time bound for the basic cluster-popping algorithm.
3.1. Expected running time
A graph is bi-directed if each arc has an anti-parallel twin in as well. For an arc , let be its reversal. Let and . It was shown that cluster-popping can take exponential time in general [GP14] while in bi-directed graphs, the expected number of resampled variables is at most [GJ18c], where . We will now design a -preserving injective mapping from to for a connected and bi-directed graph , where is the set of all root-connected subgraphs, and is the set of subgraphs with a unique minimal cluster. Applying Theorem 3 with , the expected number of resampled variables is . Thus a factor is shaved.
Our -preserving injective mapping is modified from the one in [GJ18c]. The main difference is that the domain is instead of , and thus we need to be able to recover the random variable in the encoding. The encoding is also more efficient. For example, we only record instead of in [GJ18c] as explained below.
For completeness and clarity we include full details. We assume that the bi-directed graph is connected so that , and the root is arbitrarily chosen. (Apparently in a bi-directed graph, weak connectivity is equivalent to strong connectivity.)
For each subgraph , the rough idea is to “repair” so that no minimal cluster is present. We fix in advance an arbitrary ordering of vertices and arcs. Let be the unique minimal cluster in and be an arc so that , namely . Let denote the set of all vertices which can reach the root in the subgraph . Since , . Let . Since is root-connected, there is at least one arc in from to . Let be the first such arc, where and . Let
Consider the subgraph , where
We consider the directed acyclic graph (DAG) of strongly connected components of , and call it . (We use the decoration to denote arcs, vertices, etc. in .) To be more precise, we replace each strongly connected component by a single vertex. For a vertex , let denote the strongly connected component containing . For example, is the same as the minimal cluster by Claim 4. We may also view as a vertex in and we do not distinguish the two views. The arcs in are naturally induced by . Namely, for , an arc is present in if there exists , such that .
We claim that is root-connected with root . This is because must be the unique sink in and is acyclic. If there is another sink where , then is a minimal cluster in . This contradicts to .
Since is root-connected, there is at least one path from to . Let denote the set of vertices of that can be reached from in (including ), and . Then is a cluster and is the unique source in . As is root-connected, . To define , we reverse all arcs in and add the arc to eliminate the cut. Formally, let
Let be the graph obtained from by reversing all arcs induced by . Observe that becomes the unique sink in (and becomes the unique source).
We verify that . For any , can still reach in since the path from to in is not changed. Since , can reach and hence . For any , can reach as is the unique sink in . For any , can reach since the path from to in is not changed.
The mapping defined in (3) is -preserving and injective.
It is easy to verify -preservingness, since flipping arcs leaves its weight unchanged. The only move changing the weight is to add the arc in , which results in at most change.
Next we verify that is injective. To do so, we show that we can recover and given , , and . Clearly, it suffices to recover just .
First observe that in , is the first vertex on any path from to . Thus we can recover . Remove it from . The set of vertices which can reach in is exactly in . Namely we can recover and . As a consequence, we can recover all arcs in that are incident with , as these arcs are unchanged.
What is left to do is to recover arcs in . To do so, we need to find out which arcs have been flipped. We claim that is acyclic. Suppose there is a cycle in . Since is acyclic, the cycle must involve flipped arcs and thus vertices in . Let be the lowest one under the topological ordering of . Since is a cluster, the outgoing arc along the cycle in must have been flipped, implying that and is in . This contradicts to the minimality of .
Since is acyclic, the strongly connected components of are identical to those of . (Note that flipping all arcs in leaves strongly connected components inside unchanged.) Hence contracting all strongly connected components of results in exactly . All we need to recover now is the set . Let be the set of vertices reachable from in . It is easy to see that since we were flipping arcs. We claim that actually . For any , there is a path from to in . Suppose . Since , we may assume that is the first vertex along the path such that where . Thus has not been flipped and is present in . However, this contradicts the fact that is a cluster in .
To summarize, given , , and , we may uniquely recover and thus . Hence the mapping is injective. ∎
To sample edge-weighted root-connected subgraphs, the expected number of random variables drawn in Algorithm 2 on a connected bi-directed graph is at most , where and .
where is the probability that under the product distribution, there is a unique minimal cluster , and is the probability that under the product distribution, there is no cluster. Through the -preserving injective mapping , namely (3), if we fix , it is easy to see that
As the above holds for any , we have that the expected number of resampling of any variable is at most . This is an upper bound for the expected depth of the resampling table.
3.2. An efficient implementation
Theorem 6 bounds the number of random variables drawn. However, regarding the running time of Algorithm 2, in addition to drawing random variables, a naive implementation of Algorithm 2 may need to find clusters in every iteration of the while loop, which may take as much as time by, for example, Tarjan’s algorithm [Tar72]. In Algorithm 3 we give an efficient implementation, which can be viewed as a dynamic version of Tarjan’s algorithm.
Algorithm 3 is sequential, and its correctness relies on the fact that the order of resampling events for extremal instances does not affect the final output. See [GJ18a, Section 4] for a similar sequential (but efficient) implementation of “bicycle-popping”.
Two key modifications in Algorithm 3 comparing to Tarjan’s algorithm are:
in the DFS, once the root is reached, all vertices along the path are “set” and will not be resampled any more;
the first output of Tarjan’s algorithm is always a strongly connected component with no outgoing arcs. Such a component (if it does not contain the root ) is a minimal cluster and will immediately be resampled in Algorithm 3.
Let be the input graph. For , let be the set of arcs going out from . Recall that for , .
We first observe that in Algorithm 3, is always the number of variables that have already been indexed. Furthermore, inside the function “Dynamic-DFS” of Algorithm 3, for any , if is defined, then can reach the vertex indexed by . This can be shown by a straightforward induction on the recursion depth, and observing that resampling in any Dynamic-DFS will not affect arcs whose heads are indexed before .
At the beginning of each iteration of the while loop in Algorithm 3, all vertices whose indices are defined can reach the root , and they will not be resampled any more.
We do an induction on the number of while loops executed. The base case is trivial as only is indexed before the first iteration. Suppose we are to execute Dynamic-DFS for some . By the induction hypothesis, for any such that , can reach the root . As is non-increasing for any , the only possibility to exit any (recursive) call of Dynamic-DFS is that . Suppose after finishing Dynamic-DFS, is the lowest vertex that is newly indexed but cannot reach , and for some . Then can reach and . However, the latter implies that can reach by the choice of , which is a contradiction. ∎
In particular, Lemma 7 implies that once the algorithm halts, all vertices can reach the root , and the output is a root-connected subgraph.
Inside the function “Dynamic-DFS” of Algorithm 3, if happens, then a minimal cluster is resampled.
Suppose . We want to show that all vertices indexed after form a minimal cluster. Let .
Clearly can reach all vertices in . For any , if , then , contradicting to our assumption. Thus for all . Let be the lowest indexed vertex that cannot reach . Since the recursive call of has exited, . Thus is a vertex in and can reach due to the choice of . It implies that can reach and thus , a contradiction. Hence, all of can reach , and thus is strongly connected.
If there is any arc in the current going out from , say for some , then it must be that . However, this implies that , contradicting to for all . This implies that there is no arc going out from . Thus, is a cluster and is strongly connected, and it must be a minimal cluster. ∎
Now we are ready to show the correctness and the efficiency of Algorithm 3.
By Lemma 7 and Lemma 8, in Algorithm 3, only minimal clusters are resampled and the halting rule is when no minimal cluster is present. In other words, the resampling rules are the same for Algorithm 3 and Algorithm 2, except the difference in orderings. Given a resampling table, our claim is that the resampled variables are exactly the same for Algorithm 3 and Algorithm 2, which leads to an identical final state. The claim can be verified straightforwardly, or we can use a result of Eriksson [Eri96]. If any two minimal clusters are present, then they must be disjoint (Condition 1), and thus different orders of resampling them lead to the same state for a fixed resampling table. This is the polygon property in [Eri96], which by [Eri96, Theorem 2.1] implies the strong convergence property. The latter is exactly our claim.
Regarding the running time of Algorithm 3, we observe that its running time is linear in the final output and the number of resampled variables. The expected number of resampled variables of Algorithm 3 is the same as that of Algorithm 2 due to the claim above. Thus, Theorem 6 implies the claimed run-time bound. ∎
We can further combine Algorithm 3 with the coupling procedure [GJ18c, Section 5] to yield a sampler for edge-weighted spanning connected subgraphs in an undirected graph, which is key to the FPRAS of all-terminal network reliability. The coupling performs one scan over all edges. Thus we have the following corollary of Theorem 9.
There is an algorithm to sample edge-weighted spanning connected subgraphs in an undirected graph with expected running time , where and .
3.3. A tight example
Our example is the bi-directed version of the “lollipop” graph, where a simple path of length is attached to a clique of size . A picture is drawn in Figure 1.
The main tool is still the formula (2). We have constructed an injective mapping . Thus, to derive a lower bound, we just need to lower bound the weighted ratio between and . The main observation is that for most , the tuple as long as is not the right endpoint of , and is an arc in . We may choose and so that and the number of arcs in is . The bound in Algorithm 2 is tight with this choice.
For concreteness, let for some and consider . Then the number of vertices is and the number of arcs is . Let the root be the leftmost vertex of , and be the vertex where and intersect. A subgraph must contain a directed path from to , as well as a root-connected subgraph in with root . For any constant , since is a clique, with high probability, a random subgraph (by drawing each edge independently with probability ) in is strongly connected.222This fact is easy to prove. Recall that the analogue connectivity threshold for Erdős-Rényi random graph is . Let be the set of subgraphs which contain a directed path from to and strongly connected inside . Clearly the total weight of is of that of .
For each , , and an arc in , it is straightforward to verify that the “repairing” procedure in Lemma 5 goes through. We first remove the arc , where is the next vertex on , and call the vertex set that cannot reach the root . Since , if we contract strongly connected components in , it must be a directed path. Flip all arcs in to get . The subgraph must have the same collection of strongly connected components as , and the contracted graph is a directed path in the reverse direction. Clearly there is a unique sink, namely a minimal cluster in , which is the clique with possibly a few vertices along the path . Hence is in the unique minimal cluster of .
where the term accounts for the initialisation.
If we set , then the running time becomes . However, the optimal constant (measured in ) is not clear.
An interesting observation is that the running time of Algorithm 1 depends on the choice of in the example above, although in the reliability approximation algorithm, can be chosen arbitrarily. (See [GJ18c, Section 5].) However, choosing the best does not help reducing the order of the running time. In a “barbell” graph, where two cliques are joined by a path, no matter where we choose , there is a rooted induced subgraph of the same structure as the example above, leading to the same running time when .
3.4. Faster reliability approximation
The main application of Algorithm 2 is to approximate the network reliability of a undirected graph , which is the probability that, assuming each edge fails with probability independently, the remaining graph is still connected. Let
be the vector of the failure probabilities. Then the reliability is the following quantity:
The approximate counting algorithm in [GP14, GJ18c] takes samples of spanning connected subgraphs to produce a approximation of . However, we can rewrite (4) as a partition function of the Gibbs distribution. Thus we can take advantage of faster approximation algorithms, such as the one by Kolmogorov [Kol18].
Let be a finite set, and the Gibbs distribution over is one taking the following form:
where is the temperature, is the Hamiltonian, and is the normalising factor (namely the partition function) of the Gibbs distribution. We would like to turn the sampling algorithm into an approximation algorithm to . Typically, this involves calling the sampling oracle in a range of temperatures, which we denote . (This process is usually called simulated annealing.) Let , , and . The following result is due to Kolmogorov [Kol18, Theorem 8 and Theorem 9].
Suppose we have a sampling oracle from the distribution for any . There is an algorithm to approximate within multiplicative errors using oracle calls in average.
Moreover, the sampling oracle can be approximate as long as where is the variation distance.
A straightforward application of Proposition 11 to our problem requires samples. This is because annealing will be done on all edges. Instead, we will choose a spanning tree, and perform annealing only on its edges, whose cardinality is . This approach uses only samples, but it requires the following slight generalisation of Proposition 11.
Let be the following distribution over a finite set :
where is a non-negative function and, with a little abuse of notation, is the normalising factor. Still, let , , and .
Suppose we have a sampling oracle from the distribution defined in (5) for any . There is an algorithm to approximate within multiplicative errors using oracle calls.
We claim that we can straightforwardly apply the algorithm in Proposition 11 to get an approximation to for .
To see this, let be an integer. Let be the multi-set of where each is duplicated times. To avoid multi-set, we can simply give each duplicated an index to make an ordinary set. Consider the following Gibbs distribution over :
where . Let
We have that
Clearly, as , . We choose large enough so that is within the threshold in Proposition 11 for approximate sampling oracle. Thus, the output of directly applying the algorithm in Proposition 11 with sampling oracle also yields an -approximation to . (Note that we do not really run the algorithm on .) Since as , the output of directly applying Proposition 11 is in fact an -approximation to . ∎
Suppose we want to evaluate for a connected undirected graph . Since is connected, , where and . Fix an arbitrary spanning tree of in advance. Let
Let be the set of all spanning connected subgraphs of , and be the following distribution over :
where is the temperature, is the Hamiltonian, and the normalising factor For any , let be the probability such that . We have that for any ,
To draw a sample from , we use Corollary 10, for a vector such that every fails with probability , and every other fails with probability . To see that this recovers the distribution , notice that the weight assigned to each subgraph is
Let and . Indeed, corresponds to . Hence, if and only if . This condition implies that , and
On the other hand, . Then