The hard disks model is one of the simplest gas models in statistical physics. Its configurations are non-overlapping disks of uniform radius in a bounded region of . For convenience, in this paper, we take this region to be the unit square . This model was precisely the one studied by Metropolis et al. 
, in their pioneering work on the Markov chain Monte Carlo (MCMC) method. They used Los Alamos’ MANIAC computer to simulate a system withdisks.
There are two variants of this model. To obtain the canonical ensemble, we fix the number (or equivalently, density) of disks and decree that all configurations are “equally likely”, subject only to the disks not overlapping. In the grand canonical ensemble, we fix the “average” number of disks. To be more specific, centers of the disks are distributed according to a Poisson point process of intensity , conditioned on the disks being non-overlapping. The hard disks model, and its higher dimensional generalization (called the hard spheres model) are also related to the optimal sphere packing density [6, 20, 2]. See [8, 1] and references therein for more details. See also  for the physics perspective.
Our main aim in this work is to describe and analyse a very simple algorithm for exactly sampling from the grand canonical ensemble, based on the partial rejection sampling paradigm introduced by Guo, Jerrum and Liu .
More precisely, the challenge is the following: produce a realisation of a Poisson point process of intensity in the unit square, conditioned on the event that no pair of points in are closer than in Euclidean distance. We refer to this target measure as the hard disks distribution. It describes an arrangement of open disks of radius with centres in that are not allowed to overlap, but which otherwise do not interact. It is a special case of the Strauss process . Note that, although the disks do not overlap each other, they may extend beyond the boundary of the unit square. Also, the intensity of the underlying Poisson process is normalised so that the expected number of points of lying in a disk of radius is . This normalisation gives us sensible asymptotics as the radius of the disks tends to zero (equivalently, the number of disks tends to infinity).
Classical rejection sampling applied to this problem yields the following algorithm: repeatedly sample a realisation of the Poisson process of intensity in the unit square until satisfies the condition that no two points are closer than , and return . Unfortunately, for every , however small, the expected number of unsuccessful trials using this approach increases exponentially in , as . Partial rejection sampling  requires only a subset of to be resampled at each iteration. Algorithm 1 below arises from a routine application of the paradigm to the problem at hand.
The original partial rejection method  and its analysis are tailored for the discrete case. In this paper we provide an alternative view on the correctness of the method, which is also valid in the continuous setting. In other words, as with classical rejection sampling, Algorithm 1
terminates with probability 1, producing a realisation of the exact hard disks distribution.
The proof of this result forms the content of Section 3.
In contrast to classical rejection sampling, the expected number of iterations (resampling steps) is now asymptotically as , provided is not too large. Furthermore, with a suitable implementation, the total runtime is , i.e., linear in the number of disks. We prove that rapid termination occurs when . This analysis is not tight, and experiments suggests that the actual threshold for rapid termination is around . Figure 1 is a realisation of with . The resulting density is .
Fix . Then the expected number of iterations of the while-loop in Algorithm 1 is . Moreover, with a suitable implementation, the overall runtime of the algorithm is .
The proof of this result forms the content of Section 4.
The method extends naturally to the hard spheres model in dimensions. Here, the desired distribution is a Poisson point process in conditioned on no pair of points being closer than . The natural normalisation for the intensity of the Poisson process in dimensions is , where is the volume of a ball of radius in . With this convention, we prove that rapid termination occurs in dimensions provided .
The expected packing density or simply for this model is the expected total volume of spheres. (Note that, neglecting boundary effects, is the proportion of the unit cube occupied by spheres.) The quantity grows monotonically with , but intuitively we expect its rate of growth to slow down dramatically as the spheres pack more tightly. The connection between expected packing density and intensity has recently been thoroughly explored by Jenssen, Joos and Perkins . Using their results, we show that partial rejection sampling can achieve expected packing density while retaining rapid termination in iterations. Although sphere packings of density have been proved to exist, there is no polynomial-time sampler that provably achieves packing density beyond , as far as we are aware.
Other approaches to exact sampling include Coupling From The Past (CFTP), which was adapted to point processes by Kendall  and Kendall and Møller . Recently, Moka, Juneja and Mandjes  proposed an algorithm based on rejection and importance sampling. Although their algorithm, like ours, is based on rejection sampling, it does not share our asymptotic performance guarantee. Indeed, its runtime appears to grow exponentially as the number of disks goes to infinity, with the density of disks held constant.
The most widely used approach to sampling configurations from the hard disks model is Markov chain Monte Carlo (MCMC). Here the desired distribution is approached in the limit as the runtime of the sampler tends to infinity. In this sense, MCMC produces an approximate sampler, though the error (in total variation distance) decays exponentially with . The problem lies in deciding how large should be in order to ensure that the samples obtained are close enough to the desired distribution. There are two possibilities. The runtime
may be chosen heuristically, in which case the quality of the output from the sampler is not guaranteed. Otherwise, an analytical upper bound on mixing time may be used to determine a suitable, but then the tendency is for this bound to be very conservative. The experimental advantage of partial rejection sampling is its simple termination rule, combined with the property of generating perfect samples from the desired distribution.
Approximate sampling via Markov chain simulation has been studied by Kannan, Mahoney and Montenegro  and Hayes and Moore  in the context of the canonical ensemble, where the number of spheres in a configuration is fixed. (So in the MCMC approach, it is the density that is chosen in advance, while in the approach taken here we choose in advance and then follows.) Kannan et al.  show that rapid mixing (convergence to near-stationarity in time polynomial in the number of disks) occurs for densities , in dimension . The best rigorously derived density bound guaranteeing rapid mixing in two dimensions is given by Hayes and Moore  and is . (Note that this improves on the obtained by Kannan et al.) It is should be noted that these results are not directly comparable with our owing to the difference in models. To obtain canonical ensembles, we could use Algorithm 1 and further condition on the number of desired disks. However, the only rigorous guarantee for this approach, via , is .
It is believed that the hard disks model in two dimensions undergoes a phase transition at a certain critical density: at lower densities configurations are disordered, while at higher densities, long range correlations can be observed. Unfortunately, is well beyond the densities that can be achieved by samplers (either based on MCMC or rejection sampling) with rigorous performance guarantees. However, heuristic approaches using sophisticated MCMC samplers  suggest that the critical density is around .
Finally, determining the maximum achievable density is the classical sphere packing problem. Despite extensive study, the exact solution is known only for dimensions and 24. See [8, 16] for rigorous bounds on packing densities in general dimension .
A preliminary version of this paper was presented at ICALP 2018 .
2. The sampling algorithm
The following notation will be used throughout. If is a finite subset of then
where denotes Euclidean norm, and
The open disk of radius with centre is denoted by . The finite set always denotes a realisation of the Poisson point process of intensity on
. For a random variableand event we use to denote the distribution (law) of , and the distribution of conditioned on occurring. Thus, is the hard disks distribution that we are interested in.
Our goal is to analyse the correctness and runtime of a sampling algorithm for the hard disks model (see Algorithm 1 below), specifically to determine the largest value of for which it terminates quickly, i.e., in iterations. The algorithm is an example application of “Partial Rejection Sampling” , adapted to the continuous state space setting.
3. Correctness (Proof of Theorem 1)
Let be any finite subset of . We say that is a feasible set of bad points if ; this is equivalent to saying that there is a finite subset with . The key to establishing correctness of Algorithm 1 is the following loop invariant:
for every feasible set , where is any intermediate set of points during the execution of the algorithm, and is a realisation of the Poisson point process. Let us consider what the right hand side means operationally. Let . (This is the resampling set used by the algorithm.) Let be a sample from the distribution . The only points in that lie inside are the points in . (Any extra points would create more bad pairs than there actually are.) Thus . Outside of there are no bad pairs; thus is a sample from the hard disks distribution on . Note that, setting , we see that is just the hard disks distribution on .
Let (a random variable) be the number of iterations of the while-loop. On each iteration, the while loop terminates with probability bounded away from 0; thus is finite with probability 1. (Indeed, has finite expectation.) Let , for , be the point set after iterations of the loop, and be the initial value of (which is just a realisation of the Poisson point process on ). We say that is a feasible sequence of (finite) point sets if there exists a run of Algorithm 1 with . Theorem 1 will follow easily from the following lemma.
Let be a feasible sequence. Then
We prove the result by induction on . The base case, , holds by construction: is just a realisation of the Poisson point process on . Our induction hypothesis is
for every feasible sequence . Extend the feasible sequence to . For the inductive step, we assume (1) and aim to derive
The resampling set on iteration is . As a first step we argue below that
where denotes the set of unordered pairs of elements from . We have noted that (1) implies that, outside of the resampling set , the point set is a realisation of the hard disks distribution. Also, the algorithm does not resample points outside of . Thus
is Poisson distributed, conditioned on there being no bad pairs. Inside, resampling has left behind a fresh Poisson point process . These considerations give (3).
Next, we condition on :
Since contains no bad pairs with both endpoints in , the event entails the event . Thus, we have
The right hand side of this equation does not involve , and so
This completes the induction step (2) and the proof. ∎
4. Run-time analysis (Proof of Theorem 2)
We consider how the number of “bad events”, i.e., the cardinality of the set , evolves with time. As usual denotes a realisation of the Poisson point process of intensity . Also denote by a realisation of the hard disks process of the same intensity. We need the following stochastic domination result.
The hard disks distribution is stochastically dominated by the Poisson point process with the same intensity. That is, we can construct a joint sample space for and such that .
Holley’s criterion is a useful test for stochastic domination, but it is not of direct use to us in the proof of Lemma 4, because it applies only to finite state spaces. Fortunately, Preston [17, Theorem 9.1], has derived a version of Holley’s criterion that fits our situation. We will mostly follow Preston’s notation, except that, to save confusion, we will use and , rather than and , to denote finite sets of points. In order to state his result, we need some notation. In our application, is the distribution on obtained by sampling points independently and uniformly at random from , and regarding the points as indistinguishable; furthermore, . (For consistency with Preston, we have left
unnormalised. If we had made it into a probability distribution by division by, then could be thought of as follows: sample an integer from the Poisson distribution with mean 1, and then pick (unlabelled) points uniformly and independently.) Denote by the set of all finite subsets of , and by the set of non-negative measurable functions satisfying
|(5)||and implies .|
(See Preston [17, Section 9] for detailed formal definitions of the concepts here.)
Lemma 5 (Theorem 9.1 of ).
Let and suppose that
(where, by convention, ). Then, for all bounded, measurable, non-decreasing functions ,
i.e., if is the probability measure having density with respect to , then stochastically dominates .
Proof of Lemma 4.
The normalising constants and are chosen so that both and satisfy (4). (There is an explicit expression for , namely , but not for .) Notice that both and also satisfy (5). Notice also that the probability measures and of the Poisson point process and the hard disks process have densities and with respect to . The premise (6) of Lemma 5 holds, since the left-hand side is always and the right-hand side is either or 0. The conclusion is that dominates . Strassen’s Theorem [12, 18], allows us to conclude the existence of a coupling of and as advertised in the statement of the lemma (except, possibly, on a set of measure zero). ∎
There is a bound such that the expected number of iterations of the while-loop in Algorithm 1 is when .
First observe that determines and vice versa. So conditioning on the set is equivalent to conditioning on .
Introduce random variables , for . Our strategy is to show that
for some . Then is a supermartingale (with the convention that for all ). Therefore, . Here, we have used the fact that is a Poisson random variable with expectation , and
Setting , we obtain , and hence . It follows that the expected number of iterations of the while-loop of Algorithm 1 is . Note that the probability of non-termination decreases exponentially with , so the probability of large deviations above the expected value of is low.
Crude estimates give. The calculation goes as follows. Suppose, in (7), we condition on the random variables , rather than simply on . This is more stringent conditioning, since the former random variables determine the latter. It is enough to establish (7) under the more stringent conditioning. So fix , and note that this choice also fixes the resampling sets . Suppose . Inside the resampling set we have a Poisson point process of intensity . Outside, by Lemma 3, there is a realisation of the hard disks process. Since we are seeking an upper bound on we may, by Lemma 4, replace by a Poisson point process of intensity .
Let . From the above considerations we have
This is an overestimate, as we are double-counting overlapping disks whose centres both lie within . Now, is a union of at most disks of radius . Thus
There are further sources of slack here: there may be fewer than disks, the disks comprising certainly overlap, and, for points near the boundary, some of disks will lie partly outside the unit square. (The last of these presumably has no effect asymptotically, as .) Setting , we see that for any , and is a supermartingale, as required. ∎
The constant may seem quite small. Note, however, that classical rejection sampling cannot achieve any . The argument goes as follows. Divide into squares. If there are two points in the same square then they will certainly be less than distance apart. The number of points in each square is Poisson distributed with mean . Thus for any the probability that a particular square has at least two points is bounded away from zero. The number of points in each square is independent of all the others. It follows that the runtime of classical rejection sampling is exponential in .
The above derivation for is quite crude and can be improved.
The constant in Lemma 6 can be taken to be .
For each of the disks, the right-hand side of inequality (9) counts pairs of points with in the disk, and anywhere within distance of . Since a bad event is determined by an unordered pair of points, this gives rise to significant double counting. In particular, pairs with and lying in the same ball are double counted. We can subtract off these pairs to obtain a better estimate.
For a single ball, the correction is
(The initial factor of one half arises because we want to count unordered pairs.) With the change of variables and this expression simplifies to
where is the area of the “lens” . Letting denote the offset of the centres of the two disks, the area of the lens is given by
(This is by elementary geometry: the lens is the intersection of two sectors, one from each of the disks, and its area can be computed by inclusion-exclusion.) An illustration (before shifting to ) is given in Figure 2. The shaded area is .
Translating to polar coordinates ,
(The integral was evaluated using the Maple computer algebra system.) Our revised upper bound on is thus
yields the improved bound of . ∎
There are other factors that could in principle be used to increase further — each disk necessarily overlaps with at least one other disk, some bad events are triple or quadruple counted — but the computational difficulties rapidly increase when attempting to account for these.
We have just seen that the number of iterations of the while-loop in Algorithm 1 is small when is below some threshold. It just remains to check that the body of the loop can be implemented efficiently.
There is an implementation of Algorithm 1 such that, for any fixed , the expected runtime of the algorithm is .
No sophisticated data structures are required, but the runtime analysis requires some work.
Divide the unit square into a grid of squares. Index the grid squares by , where . Let the grid squares be . Note that if two points of a point set lie in the same grid square then they will necessarily form a bad pair and lie in . Moreover, if and and then necessarily . So it seems, intuitively, that we should be able to implement the body of loop to run in time , since we only have to examine pairs of grid squares in order to complete the computation. Given Lemma 6, this would give an upper bound on total runtime of .
However, we can do better than this by noting that the amount of work to be done in each iteration decays exponentially. Recall the notation employed in Algorithm 1. Assume that, in addition to and , we maintain a list of grid squares containing points in . In order to perform the computations within the while-loop, it is only necessary to consider grid squares that are within constant distance of a grid square in . As the length of the list decays exponentially, according to Lemma 6, we expect the overall runtime to be rather than just . This is indeed the case. However, there is a technical issue; the time taken for the th iteration of the loop is a random variable, say . Recall the notation of Lemma 6, in particular that denotes the number of bad pairs of points (ones within distance of each other) after iterations of the loop. As may be expected, we can bound the expectation of by a linear function of . Unfortunately, we can’t just sum these estimates over to get an upper bound on expected total runtime, as the random variable is presumably correlated with . Instead, we shall “charge” the operations within the th iteration of the loop either to or . and hence bound the expected runtime by .
Consider the work done during the th iteration of the while-loop. Assume that the point sets and are available either from the initialisation phase or from the previous iteration of the loop. Also assume that we have the list containing grid squares containing points in . Note that , since each bad pair contributes at most two bad points. To compute , we could sample a realisation of the Poisson point process in the unit square and reject all those points that are not within distance of some point in . However, this would be inefficient. Instead, we just produce realisations of a Poisson point process within grid squares that are within distance two of a grid square in . Here, we take the distance between grid squares and to be . We then reject each new point unless it is within distance of a point in ; the result is the required point set .
Thus, restricted to grid square , we have that is the number of bad points carried forward from the previous iteration, is the number of fresh points added during the current iteration, and is the total number of points at the end of the current iteration. Also, let be the number of points generated during the current iteration that did not survive because they were not within distance of some bad point.
Let . Note that, in order to compute the points that need to be added to during the current iteration, only pairs of grid squares need to be examined. (Each new point must be within distance of an existing bad point.) The work done during iteration in computing is then proportional to
Here, represents the fixed cost of cycling through all the relevant pairs of grid squares, and is the additional time required to examine each fresh point and see whether it needs to be retained (i.e., added to ).
Now , and so . Also, is stochastically dominated by a Poisson random variable with mean ; thus
Note that the new points in grid square are freshly generated during the current iteration, and are discarded before the end of the iteration, and so have no effect on the subsequent evolution of the algorithm; the sequence remains a supermartingale, and the analysis in Lemma 6 is unaffected. (All other estimates will be deterministic, so this is the only point where we need to consider the potential for conditioning.)
The remaining term may be bounded as follows:
Here we use the fact that any pair of distinct points in a single grid square is certainly a bad pair.
The final task is to compute the new set of bad points. Since each of the new bad pairs must involve at least one point in , the time to complete this task is proportional to
Now any grid square with must be within distance 2 of a grid square in the list . Therefore, by similar reasoning to that used earlier,
So again, the time for this phase of the loop is bounded by
It remains to analyse the initialisation phase. Generating the realisation of a Poisson point process of intensity in the unit square clearly takes time . To identify the bad pairs, we cycle through pairs of grid squares with ; there are of these. An argument identical to those above shows that the time taken to identify the bad pairs is . We are almost done, but we do need a better estimate for that the one used in Lemma 6, which was . (This crude estimate was adequate at the time, as we were only interested in the logarithm of this quantity.) But we can now see that bad pairs can only come from pairs of grid squares separated by distance at most two. There are of these, and each of them generates bad pairs in expectation, so that . We saw in Lemma 6 that , is a supermartingale with (with the convention that for ). Thus the overall runtime of Algorithm 1 is
in expectation, assuming . ∎
5. Three or more dimensions
In higher dimensions, the hard disk model is known as the hard spheres model. Everything in Sections 3 and 4 carries across to dimensions with little change. For general , the appropriate scaling for the intensity is , where is the volume of a ball of unit radius in dimensions. Note that in a realisation of a Poisson point process with intensity , the expected number of points in a ball of radius is .
The analogue of equation (8) is
which leads to
So setting we find that for any . It follows that the runtime of partial rejection sampling is for any .
By a result of Jenssen, Joos and Perkins , we lose just a constant factor when translating from intensity to packing density . (It is partly to connect with their work, we measure intensity in terms of the expected number of points in a ball of radius .) In the proof of [8, Thm 2], the following inequality is derived:
Assuming , which holds in the range of validity of our algorithm, we have and hence
Note that is monotonically increasing, with , and . It follows that we can reach expected packing density with expected iterations. This is currently the best that can be achieved by any provably correct sampling algorithm with polynomial (in ) runtime . The asymptotically best packing density currently rigorously known is , but achieving this would require to grow exponentially fast in . This is clearly beyond the capability of partial rejection sampling, but also beyond the capability of any known efficient sampling algorithm.
We thank Mark Huber and Will Perkins for inspiring conversations and bringing the hard disks model to our attention.
-  Henry Cohn. A conceptual breakthrough in sphere packing. Notices Amer. Math. Soc., 64(2):102–115, 2017.
-  Henry Cohn, Abhinav Kumar, Stephen D. Miller, Danylo Radchenko, and Maryna Viazovska. The sphere packing problem in dimension 24. Ann. of Math. (2), 185(3):1017–1033, 2017.
-  Michael Engel, Joshua A. Anderson, Sharon C. Glotzer, Masaharu Isobe, Etienne P. Bernard, and Werner Krauth. Hard-disk equation of state: First-order liquid-hexatic transition in two dimensions with three simulation methods. Phys. Rev. E, 87:042134, Apr 2013.
-  Heng Guo and Mark Jerrum. Perfect simulation of the hard disks model by partial rejection sampling. In ICALP, volume 107 of LIPIcs, pages 69:1–69:10. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
-  Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the Lovasz local lemma. In STOC, pages 342–355, 2017.
-  Thomas C. Hales. A proof of the Kepler conjecture. Ann. of Math. (2), 162(3):1065–1185, 2005.
-  Thomas P. Hayes and Cristopher Moore. Lower bounds on the critical density in the hard disk model via optimized metrics. CoRR, abs/1407.1930, 2014.
-  Matthew Jenssen, Felix Joos, and Will Perkins. On the hard sphere model and sphere packings in high dimensions. ArXiv, abs/1707.00476, 2017.
-  Ravi Kannan, Michael W. Mahoney, and Ravi Montenegro. Rapid mixing of several Markov chains for a hard-core model. In ISAAC, pages 663–675, 2003.
-  Wilfrid S. Kendall. Perfect simulation for the area-interaction point process. In Probability towards 2000 (New York, 1995), volume 128 of Lect. Notes Stat., pages 218–234. Springer, New York, 1998.
-  Wilfrid S. Kendall and Jesper Møller. Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. Appl. Probab., 32(3):844––865, 2000.
-  Torgny Lindvall. On Strassen’s theorem on stochastic domination. Electron. Comm. Probab., 4:51–59, 1999.
-  Hartmut Löwen. Fun with hard spheres. In Klaus R. Mecke and Dietrich Stoyan, editors, Statistical Physics and Spatial Statistics, pages 295–331, 2000.
-  Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. Equation of state calculations by fast computing machines. J. Chem. Phys., 21(6):1087–1092, 1953.
-  S. B. Moka, S. Juneja, and M. R. H. Mandjes. Perfect sampling for Gibbs processes with a focus on hard-sphere models. ArXiv, abs/1705.00142, 2017.
-  Will Perkins. Birthday inequalities, repulsion, and hard spheres. Proc. Amer. Math. Soc., 144(6):2635–2649, 2016.
-  Chris Preston. Spatial birth-and-death processes (with discussion). Bull. Inst. Internat. Statist., 46(2):371–391, 405–408 (1975), 1975.
-  V. Strassen. The existence of probability measures with given marginals. Ann. Math. Statist., 36:423–439, 1965.
-  David J. Strauss. A model for clustering. Biometrika, 62(2):467–475, 1975.
-  Maryna S. Viazovska. The sphere packing problem in dimension 8. Ann. of Math. (2), 185(3):991–1015, 2017.