1 Introduction
We present an algorithm for bounding the probability of -core formation in -uniform hypergraphs. Understanding the probability of core formation is useful in numerous applications including bounds on the failure rate of Invertible Bloom Lookup Tables (IBLTs) [2] and the probability that a boolean formula is satisfiable [3].
1.1 Problem Statement
Let be a -uniform hypergraph on vertices and edge set where each of the edges in occurs with probability . An -core over vertices is an induced subgraph in which every vertex has degree at least . Define to be the probability that at least one -core forms in . In this paper, we seek upper and lower bounds on .
1.2 Establishing -core Existence
The succinct description of an -core belies the complexity associated with identifying them. The standard approach is by means of a peeling process [3]. Proceeding in rounds, all vertices with degree less than are removed from along with any incident edges. The process is repeated in the subsequent rounds, terminating after the round where no vertices are removed. The remaining vertices form an -core in .
2 General Algorithm
Calculating , the probability that exactly one -core forms anywhere in is relatively straightforward given knowledge of , the probability that an -core forms on all vertices in some subset of vertices. Algorithm 1 (appearing in Appendix B) describes how to calculate from , which we prove correct in the remainder of this section. Given that is known for all , our approach is to recursively calculate — the probability that a single -core of size , and no other, forms somewhere in — and use the partial results to construct . Notice that the probability that -cores form in is bounded above by . This is true because the latter quantity is equivalent to allowing edges to be reused between -cores. From this it follows that
where the second step is the closed-form for a geometric series.
We next show how is related to , the most critical step in the algorithm. Because we are working with random hypergraphs, the probability that an -core forms on vertex set is equivalent to the probability that an -forms on any other set where . Therefore , where the latter quantity is the probability that an -core forms on all vertices in . Because is -uniform, it is clear that no -core can form when . Hence, . The case where is covered by the following theorem.
THEOREM 1: For every ,
where
and
PROOF: We proceed by showing that is the probability that exactly one -core of size forms somewhere in and is not contained in any larger -core. While is the probability that no other -core forms in over a subset of vertices distinct from the set of vertices containing the -core of size . If both of these hold, then it is clear that , thus the theorem will be proved.
For , we begin by summing (over all possible ) the probabilities qualified by the restriction that not be contained in a larger -core over vertices . Let . For any , there exist supersets of size containing the vertices. This implies that the probability that the -core on vertices is not contained in any -core of size , is given by , which can be extended by conjunction to all . That is to say, the probability that an -core forms over all vertices and that this -core is a subset of no other -core is given by
Summing over the ways to choose a subset of vertices we arrive at the desired result, .
Turning to , we know that any -core of size greater than must intersect every -core of size , which implies that it could not be a distinct -core. Thus, we consider only distinct -cores that form over vertices in the range . Assuming that one -core of size exists, a distinct -core could only form in an induced subgraph of size , which occurs with probability . It follows that the probability that no distinct -core of size forms, for any possible value of , is given by .
3 Bounding Local -core Formation
With Theorem 2 in hand, we next seek to measure , the probability that an -core forms over a specific subset of vertices in . This paper describes two different approaches: one gives upper and lower bounds based on hypergraph connectivity and another provides a close approximation based on vertex covering.
3.1 Connectivity Bound
We begin with the observation that a connected component on vertices is equivalent to a 1-core on . Thus, is the probability that the induced subgraph on is connected. Expanding on this idea, consider an interleaved graph construction / peeling process yielding a hypergraph wherein an -core is revealed by peeling in rounds, one 1-core at a time, and edges are regenerated at random (removing the remaining old ones and adding new ones) with probability after each round. Let denote the probability that any -core forms over all vertices in using this interleaving process. In this alternative construction, an -core on vertex set exists iff a 1-core on exists during each round. Thus, . Moreover, as the next theorem shows, the probability that any -core forms in can be used to bound both above and below.
THEOREM 2: In expectation, .
PROOF: To show , let be the expected number of edges formed in . For , we divide the number of edges uniformly between rounds in parcels of . Since edges are cleared between rounds and their total number is , it follows that the probability that an -core forms in with edge probability cannot exceed the probability that an -core forms in where all edges contribute simultaneously to the formation of an -core. The inequality can be argued similarly by noting that generates and clears edges in expectation per round, thus the probability that an -core develops in it cannot be less than the probability that one forms in .
3.1.1 A recursive formula for connectivity probability
Gilbert [1] introduced the following classical result that gives the exact probability that a specific vertices in , an Erdos-Renyi random graph, are connected.
with . Function can equivalently be interpreted as the probability that the entire graph is connected.
We next prove a more general result for -uniform hypergraphs, beginning with the following definitions. For all ,
where , when ,
for and , and
THEOREM 3: The probability that a certain set of vertices form a connected component in , , is given by . Furthermore, , where and , gives the probability that there exists at least one set of vertices in that connect to each other and to no others.
PROOF:
We proceed by induction on as follows. Clearly is correct since every vertex forms a connected component with itself. And for , it is also clear that should have value 0, because no edge forms on fewer than vertices. Now suppose that gives the correct probability that a connected component of size forms in when . The probability that a set of vertices is connected is equivalent to the complement of the probability that there exists some connected component that forms exclusively on a proper subset of those vertices, which is equal to provided that gives the indicated probability. Thus, it remains only to prove the correctness of our expression for , given that holds for , and the validity of will follow.
Suppose that the induction on holds for components up to size , and consider adding a new vertex to the component. We claim that gives the probability that there exists a component on exactly vertices where . There are ways to choose a candidate subset of vertices from the original set of vertices. Adding vertex to , means the candidate subset has vertices. Since , we know by induction that all vertices in are connected with probability . So taken together, the probability of forming a connected component with size at least is given by . Now in order for candidate component to have exactly vertices, it must be the case that none of its vertices are connected to any of the remaining vertices. In a -uniform hypergraph, there are possible edges between vertices and the remaining vertices. And the probability that none of those edges exist is equal to , which completes the proof.
3.2 Covering Heuristic
Another approach to measuring , is to determine the probability that every vertex in is covered by at least edges, which is almost identical to , except that it admits the additional possibility that multiple -cores create a disjoint covering of . In this section, we develop a function , that closely approximates the coverage probability, and accordingly.
For , we can imagine that edges in are formed as follows. Each edge has slots, every slot can accommodate exactly one vertex from , and no vertex can occupy more than one slot in a single edge. Define and
to be the probability mass and cumulative distribution functions of the distribution
, respectively. Suppose that there are edges in with . Placing vertices into slots independently at random defines a Poisson process. In particular, the probability that a given vertex is assigned to at least slots, independent of the other vertices is given by . The actual number of edges in varies according to a separate Poisson process. There are possible edges in , and each is present with probability . Thus, the probability that exactly edges form within is equal to . With probabilities for edge and -core formation in hand, we can now derive our approximation to .(1) |
4 Evaluation
In this section we empirically investigate the accuracy of the local and global bounds provided in previous sections. Although the bounds we have presented apply to global -core formation for any , our experiments focus exclusively on the case where . Overall, the connectivity bound on local 1-core formation is demonstrated to closely match actual probabilities generated via Monte Carlo (MC) trials. And as a result, compared to MC, it provides good upper and lower bounds on global 2-core formation by way of Algorithm 1 and Theorem 3.1. Unfortunately, this bound also becomes numerically unstable as , the number of vertices in , grows. For larger values of
, we show empirically that the covering heuristic can be used to gain a very good approximation to the probability of 2-core formation, even though it does not always provide a strict upper bound.
4.1 Local -core probabilities
We first evaluate various techniques for computing , the probability that an -core forms on a specific set of vertices in .
Figure 1 (left) shows the probability of local 1-core formation in 3-uniform hypergraphs with core size varying from 1 to 50 and having expected edges in any set of vertices. The blue curve shows the result of 1M MC trials per point, which we generated by creating random hypergraphs and testing for the presence of a 1-core. The green curve shows the same probability as determined by Theorem 3.1.1, which we call the connectivity
approach. Finally, the orange curve gives an estimate of 1-core probability using the covering heuristic defined by Equation
1.For all values , the connectivity calculation of 1-core probability closely match that of the MC trials. In contrast, the covering heuristic provides a much less accurate value for 1-core probability when is small. However, as approaches 50, the heuristic becomes much tighter. Not shown in the plots is the numerical breakdown of the connectivity approach. For example, when , MC trials indicate that the probability that a 1-core forms is 1.9e-4. The connectivity approach predicts -1.84e+23, and the covering heuristic predicts a probability of 2.2e-4. Thus for large values of , the connectivity approach breaks down entirely, while the covering heuristic maintains a reasonable estimate. As we will in Section 4.2, the numerical instability of the connectivity calculation propagates to global -core bounds provided by Theorem 3.1.
Figure 1 (right) gives probabilities of local 2-core formation in 3-uniform hypergraphs, where again, core size varies from 1 to 50 and there are edges expected in each subset of vertices. Here we see upper and lower bounds provided by Theorem 3.1 shown in the blue and orange curves, respectively. The green curve shows an estimate of 2-core probability using the covering heuristic defined by Equation 1. Overall, the upper and lower bounds are initially close, but diverge significantly for close to 20, and then gradually begin to converge again for larger . The covering heuristic initially provides an upper bound on 2-core probability, but for , it settles somewhat below the upper bound.
4.2 Bounding global -core probabilities
We next evaluate our bound on , the probability that at least one -core forms somewhere in .
Figure 2 shows the probability of 2-core formation anywhere in a 3-uniform (left) or 4-uniform (right) hypergraph. Here we vary the total number of vertices in the hypergraph and the expected number of edges (both along the independent axis) as well as overhead, which is the ratio of vertices to edges (a distinct value for each facet). In particular, if the overhead is in a given facet, then varies from up to 60 along the independent axis and . The blue curve shows the mean of 100 MC trials per point, while the red and green curves show upper and lower bounds, respectively, using Theorem 3.1.1 along with Theorem 3.1. Breaks in the bounds occur after the value where numerical failure is detected; because probability is typically (though perhaps not necessarily) non-increasing with vertex count, and is a function of , we assume numerical breakdown when the probability begins to increase with .
There are several notable trends in the plots. First, the bounds become tighter as both and the overhead increase (i.e. as grows large relative to ). When and overhead is 1.6, we see that the upper bound drops very close to the MC curve as increases. Similarly, for and overhead equal to 2.0, the MC curve falls closely in-line with the lower bound as grows. Second, numerical instability also appears to increase with overhead. Although some instability is apparent in nearly all plots, it occurs for lower and lower values of as the overhead increases.
4.2.1 Approximate solution
Due to the numerical breakdown of the connectivity approach, we also explore an approximation to that uses the covering heuristic along with Theorem 3.1. Figure 3 shows the probability of 2-core formation anywhere in a 3-uniform (left) or 4-uniform (right) hypergraph. Again, we vary the total number of vertices in the hypergraph and the expected number of edges (both along the independent axis) as well as overhead (a distinct value for each facet). The blue curve shows the mean of 800 MC trials per point. The red curve shows the approximation.
From the plots we see two major trends. First, the approximation appears to remain above the actual probability when is small, at some point crosses below the probability, and then remains below as continues to increase. Second, the tendency for the approximation to remain above the actual probability appears to increase with . Not shown in the plot are the values for MC and approximate probability when , , overhead was , which were 1.67e-3 and 6.28e-9, respectively. Therefore, it appears that the relative difference between actual and approximate probabilities widens considerably as increases.
5 Conclusion
We have presented an algorithm for computing the exact probability that a single -core forms in a -regular hypergraph. It can be easily extended to provide an upper bound on the probability that at least one -core forms in the hypergraph. The algorithm requires a subroutine that calculates the probability that a 1-core forms in the induced hypergraph on any given subset of vertices. To that end, we also presented two methods for calculating local 1-core probability. The first method is an exact solution that uses hypergraph connectivity. We prove that this connectivity approach can be used to produce upper and lower bounds on the probability of global -core formation. The second method is an approximation that uses a covering heuristic. The exact solution is shown experimentally to break down numerically for modestly large numbers of vertices (30-50, depending on other parameters). The approximation remains numerically stable and is reasonably accurate for hypergraphs with fewer than 100 vertices; but it is not reliable as either an upper or lower bound.
References
- [1] Gilbert, E. N. Random Graphs. In The Annals of Mathematical Statistics (1959), vol. 30, pp. 1141–1144.
- [2] Goodrich, M., and Mitzenmacher, M. Invertible bloom lookup tables. In 49th Annual Allerton Conference on Communication, Control, and Computing (Sept 2011), pp. 792–799.
- [3] Molloy, M. The pure literal rule threshold and cores in random hypergraphs. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Philadelphia, PA, USA, 2004), SODA ’04, Society for Industrial and Applied Mathematics, pp. 672–681.
Appendix A Notation
order of hypergraph core | |
number of hypergraph vertices | |
expected number of hypergraph edges | |
number of vertices per edge | |
the probability that any given edge forms in the hypergraph | |
probability mass function of the distribution | |
CDF of the distribution | |
a -uniform hypergraph with vertices and edge probability | |
the induced hypergraph on vertices | |
a -uniform interleaved hypergraph with vertices and edge probability | |
probability that one or more -cores form anywhere in | |
probability that exactly one -core forms anywhere in | |
probability that an -core of size forms anywhere in | |
probability that an -core forms on a specific set of vertices in | |
probability that an -core forms in on vertices | |
probability that an -core forms on vertices in hypergraph | |
Poisson formula approximating | |
probability that certain vertices are connected in | |
probability there exists at least one component on vertices in | |
number of possible edges connecting vertex sets of size and in |