1 Introduction
Nested sampling (NS) (Skilling, 2004b, 2006)
is a popular algorithm for Bayesian inference in cosmology, astrophysics and particle physics. The algorithm handles multimodal and degenerate problems, and returns weighted samples for parameter inference as well as an estimate of the Bayesian evidence for model comparison. It was recently emphasised independently by
Riley (2019, appendix B.5.4) and Schittenhelm & Wacker (2020), however, that assumptions in NS are violated by plateaus in the likelihood, that is regions of parameter space that share the same likelihood. We should not be surprised by subtleties caused by ties in the likelihood as NS is based on order statistics and this problem and possible solutions were in fact discussed by Skilling (2004b, 2006). We believe, however, that it was underappreciated prior to Riley (2019); Schittenhelm & Wacker (2020).Plateaus most often occur in discrete parameter spaces in which every possible likelihood value is associated with a finite prior mass, or in physics problems when regions of a model’s parameter space make predictions that are in such severe disagreement with observations that they are assigned a likelihood of zero. For example, in particle physics, parameter points that fail to predict electroweak symmetry breaking would be vetoed and in cosmology portions of parameter space may be excluded if they result in unphysically large power spectra for the purposes of applying lensing, or if it is impossible to trace a consistent cosmic history (see section XIII.C of Hergt et al. 2020 for more detail). We generically refer to such points as unphysical and the observation that renders them unphysical as . In fact, within popular implementations of NS, such as MultiNest (Feroz & Hobson, 2008; Feroz et al., 2009, 2013) and PolyChord (Handley et al., 2015a, b), unphysical points can be assigned a likelihood of zero or a prior of zero through the logZero
setting. Any log likelihoods below it are treated as if the prior were in fact zero. Statistically, this makes a difference to the evidences and Bayes factors, as it changes whether we consider
to be prior knowledge or information with which we are updating, i.e. whether we wish to compute or , where represents experimental data and represents a model. The latter is problematic within NS. We consider both cases interesting: on the one hand, taking a basic observation, e.g., electroweak symmetry breaking, as prior knowledge is reasonable, but, on the other, so is judging models by their ability to predict a basic observation. Although the problem with plateaus can in general lead to faulty posterior distributions as well, when plateaus occur only at zero likelihood, they do not impact posterior inferences about the parameters of the model. There are, furthermore, realistic situations in which plateaus could occur at nonzero likelihood, e.g., if in some regions of parameter space, the likelihood function or the physical observables on which it depends are approximated by a constant.In Riley (2019); Schittenhelm & Wacker (2020), the problem caused by plateaus was formally demonstrated. After reviewing the relevant aspects of NS in section 2, in section 3, we instead make an informal explanation of the problem. We continue in section 4 by proposing a modified NS algorithm that deals with plateaus and reduces to ordinary NS in their absence. We show examples in section 5 before concluding in section 6. An implementation of the modified NS algorithm that can be used to correct evidences and posteriors found from MultiNest and PolyChord runs is implemented in anesthetic starting from version 2.0.0beta.2 (Handley, 2019).
2 Nested sampling
To establish our notation and the assumptions in ordinary NS, let us briefly review the NS algorithm. NS works by computing evidence integrals,
(1) 
where is the socalled likelihood function for the relevant experimental data and is the prior density for the model’s parameters, as Riemann sums of a onedimensional integral,
(2) 
where
(3) 
is the prior volume contained within the isolikelihood contour at and is the inverse of , i.e. . This evidently requires that such an inverse exists over the range of integration.
To tackle the onedimensional integral, we first sample points from the prior — the live points. Then, at each iteration of NS, we remove the live point with the worst likelihood and replace it with one drawn form the prior subject to the constraint that . Thus we remove a sequence of samples of increasing likelihood . In NS we estimate by the properties of how that sequence was generated. Indeed, at each iteration the volume contracts by a factor , where the arithmetic and geometric expectations of are
⟨t ⟩ &= nlivenlive+ 1,
⟨lnt ⟩ &= 1nlive.
We can then make a statistical estimate of the prior volume at the th iteration, . This enables us to compute
(4) 
The estimates in section 2
assume that the live points are uniformly distributed in
. In Fowlie et al. (2020) we proposed a technique for testing the veracity of this assumption within the context of a numerical implementation such as MultiNest or PolyChord. In the following section we discuss why plateaus violate that assumption.3 Plateaus
In Skilling (2006), plateaus were recognised as a problem, since they offer no guidance about the existence or location of points of greater likelihood and since they cause ambiguity in the ranking of samples by likelihood. The latter problem was addressed by breaking ties by assigning a rankable second property to each live point, that is expected to be unique, such as a cryptographic hash of the parameter coordinates or just a random variate from some prespecified distribution. This was implemented by extending the likelihood to include that tiebreaking second property, , suppressed by a tiny factor, , so that it doesn’t impact the evidence estimate,
(5) 
It was not stated explicitly that plateaus violate an assumption in NS, as formally shown by Riley (2019); Schittenhelm & Wacker (2020), though it was known and it was mentioned in Murray (2007, section 4.4.6). Murray (2007), furthermore, noted that hashes of the parameter coordinates in discrete parameter spaces would not be unique and thus would fail to break ties.
In the presence of plateaus, the outermost contour could contain more than one point with the same likelihood. When we replace one of the points in the plateau by a point with a greater likelihood, the volume cannot contract at all, since the outermost contour still contains other points at the same likelihood. Once we’ve replaced all the points in the plateau, the volume finally contracts. Crucially, however, it was not possible for any of the replacement points to affect the contraction, as the replacement points could never be the worst point whilst points in the plateau remain in the live points. The latter subtlety changes statistical estimates of the volume. Without plateaus, it is possible that a replacement point is or soon becomes the worst point, slowing the volume contraction. Without that possibility, the volume contracts faster and thus ordinary NS underestimates volume contraction and overestimates the evidence when there are plateaus.
The problem is illustrated by fig. 1. We show a Gaussian likelihood with mean
(orange) and a modified Gaussian likelihood with a plateau in its tails at (blue). In each case we consider a flat prior for the parameter from to . The upper right panel shows the prior distribution of the likelihood. The plateau manifests as an atom in that distribution at. The lower left panel shows the resulting enclosed prior volume as a function of the likelihood. The plateau causes a jump discontinuity at
. Finally, the lower right panel shows the distribution of . The plateau leads to an inaccessible region between about andand an atom of probability mass at
, and so is not uniformly distributed.Indeed, in ordinary NS, if outermost live points were in a plateau, we would compress by
(6) 
We should, however, compress by about
(7) 
which is an unbiased estimate of the volume outside the plateau based on binomial statistics (see
section 4.1 for further discussion). The compressions are only similar for , since(8) 
The breakdown in the NS compression in eq. 6 is shown in fig. 2. Note that this problem doesn’t impact importance nested sampling (Feroz et al., 2013), since it does not use the estimated volumes in eq. 6.
In fact, the arguments above show that in the presence of plateaus the inverse of , denoted in overloaded notation, does not exist for all (see the lower right panel in fig. 1). As shown in Schittenhelm & Wacker (2020), in this case we should instead consider the generalised inverse
(9) 
with which the evidence may be written as
(10) 
We now introduce a modified NS algorithm that correctly computes the evidence even in the presence of plateaus via eq. 10. We summarise the treatment of plateaus in existing popular NS software in table 1.
Code  Handles plateaus  Definition of constrained prior 

nestle0.2.0 (Barbary, 2016)  No  
dynesty1.0.0 (Speagle, 2020)  No  
DIAMONDS#76409b2 (Corsaro & De Ridder, 2014)  No  
MultiNest3.1.2 (Feroz & Hobson, 2008; Feroz et al., 2009, 2013)  No but compatible with anesthetic  
PolyChord1.18.2 (Handley et al., 2015a, b)  No but compatible with anesthetic  
DNest40.2.4 (Brewer & ForemanMackey, 2016)  Yes by tiebreaking labels  
Skilling’s implementation (Skilling, 2004a)  Yes by userspecified tiebreaking labels 
4 Modified NS algorithm
In our modification to NS, we remove all live points at the contour one by one without replacement, contracting the volume after each removal. If there is a plateau, there may be more than one such point; if not, our algorithm reduces to ordinary NS. After removing all such points, we finally replenish the live points by adding points sampled from the prior subject to , as usual.
After removing a point, the number of live points drops by one, such that if we were to remove such points (i.e., if there were points in the plateau) we would compress by
(11) 
if using an arithmetic estimate of the compression, and by
∑_i=1^q ⟨lnt_i ⟩ &= ∑_i=1^q 1nlive (i  1)
&≈ln(1  qnlive)
for
if using a geometric one.
Thus we find in both cases that the compression from removing points would be about in agreement with eq. 7, the difference being of order . The difference is noteworthy when , in which case the unbiased estimate of the contraction would be zero, but the above estimates are about .
In algorithm 1 and algorithm 2 we show the original and our modified NS, respectively. For concreteness we show the geometric estimator of the compression. We highlight the parts of the algorithm that are changed in red. The simple difference is that whereas in the original NS we replace a single live point, we instead replace all the points sharing the minimum likelihood and contract the volume appropriately. If there are no plateaus, the algorithms are identical.
This modified NS algorithm automatically deals with plateaus whenever they occur and can be applied retrospectively to NS runs. It automatically implements the robust NS algorithm of Schittenhelm & Wacker (2020), but with the advantage that it avoids the requirement that we explicitly decompose the evidence integral into a sum of contributions from plateaus and nontrivial contributions to be computed by NS. Other advantages are that our modification fits elegantly into ordinary NS, since the modifications are small, and that it is in the spirit of dynamic NS (Higson et al., 2019), since taking into account the changing number of live points is crucial.
We could, alternatively, evict all live points in the outermost contour, compress by , and finally top up the live points. We don’t favour this implementation of our idea, as it breaks the equivalent treatment of plateaus and nonplateaus in algorithm 2 and our use of ordinary NS compression factors, but it remains a valid possibility. This would use the same estimates of the plateau size as Schittenhelm & Wacker (2020). Lastly, we could alternatively remove plateaus entirely from the likelihood function by e.g., using the extended likelihood function in eq. 5. If the tiebreaking terms, , are random variates, they must be independent and identically distributed (Riley, 2019, appendix B.5.4.3). This, however, cannot be applied retrospectively to NS runs, and, as discussed in Riley (2019), might lead to problems in specific NS implementations.
Indeed, when using a random variate to break ties in a plateau, a new live point could lie anywhere in the plateau as for any point in the plateau we are equally likely to draw a tiebreaking random variate that leads to acceptance. Sampling from the constrained prior thus requires sampling from the whole plateau and contours above the plateau. This becomes unavoidably inefficient for large plateaus, as almost all tiebreaking random variates would lead to rejection. NS implementations that attempt to increase the sampling efficiency thus easily lead to faulty estimates of the evidence.
For example, in ellipsoidal rejection sampling, as used in MultiNest, one samples from ellipsoids that are constructed to enclose the current live points and with a volume similar to the current estimated volume, , so that the ellipsoids typically shrink during an NS run. In the case of plateaus, this could easily lead to sampling from only a subregion of the whole plateau, as the ellipsoids could shrink as we compress through a plateau. In slice sampling, as used in PolyChord, we step out from a current live point to find the likelihood contour. To sample from the whole plateau, we should step out until at least the edge of the plateau. This, however, could fail, as the tiebreaking random variate when stepping out that far could by chance lead to rejection and thus we wouldn’t step out far enough. These problems could possibly be avoided by elevating the tiebreaking random variate to a model parameter with a prior. For ellipsoidal sampling in the presence of plateaus, this would allow the ellipsoids to contract only in that dimension and without encroaching on the plateau.
4.1 Distribution of compression factor
In our modified NS, we apply ordinary NS compression factors taking into account the dynamic number of live points. This assumes beta distributions for the compression factor. When dealing with plateaus,
Schittenhelm & Wacker (2020)consider estimating the compression using the fact that the number of points inside the plateau should follow a binomial distribution parameterized by the size of the plateau. Let us consider more carefully the differences in estimates of the compression factor from these two approaches. In the latter, the number of live points,
, that fall in the outermost contour plateau follows a binomial(12) 
where is the size of the plateau and thus is the compression factor. The probability mass function is thus
(13) 
for points in the plateau and points above it. We could make inferences about by computing a posterior distribution,
(14) 
where is our prior for the size of the nonplateau region. For this is in fact the probability density for a distribution.
On the other hand, taking the approach in modified NS, the number of points that lie in the plateau
is no longer treated as a random variable. Instead, the factor
is assumed to follow a beta distribution when points are removed (Henderson & Goggans, 2014),(15) 
The density for is
(16) 
corresponding to points below , points above it, and one point at .
The fact that we must consider points below and one point at , rather than points inside a plateau, results in a factor of difference between eqs. 16 and 14. A further factor of originates from any prior knowledge about the size of the plateau. If the factors may be neglected inferences based on ordinary NS compression may be reliable.
With a flat prior for the unknown size of the plateau, the difference is that between a and a distribution. As shown in in fig. 4
, the first two moments are similar, with only moderate differences of order
even when and .^{1}^{1}1The fact that the distributions as functions of are approximately identical is enough to ensure that our inferences are approximately identical (Berger & Wolpert, 1988). This holds despite the fact that in the binomial case the number of live points in the outermost plateau is a random variable and the size of the plateau is fixed, and in the beta case the compression factor is a random variable and is fixed. This needn’t be true for frequentist estimates of the factor . The factor vanishes entirely for a logarithmic prior for the unknown size of the plateau,(17) 
Thus we see that our modified NS treatment is approximately the same as the binomial treatment, the full details of which would depend on a prior for the size of the plateau. If the differences are important, though, we could instead use the binomial statistics, as discussed at the end of section 4, and a more careful treatment of the prior.
4.2 Error estimates
The classic NS error estimate
(18) 
where is the Shannon entropy between the posterior and prior, assumes the ordinary NS compression and a constant number of live points, and so is not applicable to our modified NS. The anesthetic error estimates, however, already account for a dynamic number of live points, and so are applicable to the modified NS algorithm. The anesthetic error estimates are found (as initially suggested by Skilling, 2006) through simulations; sequences of possible compression factors are drawn from betadistributions and used to make a set of estimates . One could alternatively compute an analytic estimate using the equivalents of the arithmetic expressions in section 2
for the variance, which can be found in Appendix B of
Handley et al. (2015b). Future versions of anesthetic will also support such analytic estimates.Clearly, however, compression estimates for plateaus suffer from increased uncertainties, since we dynamically reduce the number of live points during the plateau period of an NS run. In fact, if the plateau at makes up a large fraction of the current prior volume, the error in the estimated compression could be substantial, as few points lie outside the plateau. Indeed, we show that the fractional error in the compression blows up when in fig. 4. In such cases, there may exist efficient schemes for dynamically increasing the number of live points to ensure that sufficient points lie above the plateau. For example, in algorithm 2 we must ultimately replenish the live points such that there are live points at . We could instead dynamically increase the number of live points immediately prior to the plateau by sampling them from , stopping at once of the points lie at . Once we remove points from the plateau, there would be live points remaining, and no need to top up the live points.
In the worst case scenario, no live points land in contours above the plateau, e.g., the likelihood function shows a tiny peak above a broad plateau. At this point, it would be unclear whether to proceed and if so so how to do so efficiently, since the live points in the plateau provide no clues about the presence or location of the peak. This problem would affect all NS variants known to us.
5 Examples
We now consider a few examples. First, in section 5.1 we apply our modified NS to examples considered in Schittenhelm & Wacker (2020). Second, in section 5.2 we construct a ‘weddingcake’ function that exhibits a series of plateaus and check that our modified NS correctly computes the evidence.
5.1 Examples from Schittenhelm & Wacker (2020)
For plateaus at the base of the likelihood function at , we overestimate the evidence by a factor
(19) 
where is the fraction of the prior volume occupied by the plateau. For example, in Scenario 2 of Schittenhelm & Wacker (2020), the plateau occupies of the prior volume at the base of the likelihood, and we find , in good agreement with the difference found numerically in Schittenhelm & Wacker (2020), which was . In fig. 5, we check the results of this problem with modified NS in 1000 repeat runs with 500 live points. We find that the histogram of estimates form a Gaussian peak (red bars) around the analytic result (dashed blue). The spread was welldescribed by the average uncertainty estimates from single runs of modified NS (green). The original NS results (orange), on the other hand, lie well away from the analytic result, as expected.
For plateaus at the peak of the likelihood function at , the error depends on the stopping conditions and treatment of final live points in NS. In e.g., the MultiNest implementation of NS, the run converges if all live points share the same likelihood, and the remaining evidence is correctly accounted for. If the run halts but the remaining evidence isn’t accounted for, we underestimate the evidence by a term
ΔZ&= f maxL
ΔlogZ&= log(Z+ ΔZ)  logZ
where is the fraction of the prior volume occupied by the plateau. For example, in Scenario 3 of Schittenhelm &
Wacker (2020), the plateau occupies of the prior volume at the peak of the likelihood at , and we find , in good agreement with the difference found numerically in Schittenhelm &
Wacker (2020), which was .
5.2 Wedding cake likelihood
We now construct a semianalytical example for which we can numerically confirm our approach. Consider an infinite sequence of concentric square plateaus of geometrically decreasing volume for . The edges of the plateaus lie at
(20) 
for , where denotes the infinity norm. We define the height of each plateau to have a Gaussian profile:
(21) 
where is an indicator function that, for any given , selects a single term in the sum. The resulting likelihood is therefore a set of hypercuboids with a hypercubic base of side length and height . If the base is twodimensional, this creates a tiered “wedding cake” surface, as can be seen in fig. 6. The selected by the indicator function is in fact,
(22) 
where the floor function (namely the greatest integer less than or equal to ), enabling us to write
(23) 
Given that the volume of the region is , the evidence can be expressed as:
(24) 
which as an infinite series converges sufficiently rapidly and stably to be evaluated numerically for any number of dimensions , but if speed is a requirement then a Laplace approximation shows that one only needs to consider the terms in the series around
(25) 
For reference, putting all of these equations together, the likelihood can be computed as:
(26) 
where
is a hyperparameter controlling the depth of the plateaus, and
controls the width of the overall Gaussian profile.This likelihood forms part of the anesthetic test suite, which confirms that the approach suggested in this paper recovers the true likelihood.
The wedding cake likelihood can be very useful for testing nested sampling implementations as unlike a traditional Gaussian it can be trivially sampled from using a simple random number generator, has no unexpected edge effects as the boundaries of the prior are also a likelihood contour.
6 Conclusions
Following from Riley (2019); Schittenhelm & Wacker (2020), which showed formally why NS breaks down if there are plateaus in the likelihood, we first presented the problem of plateaus in an informal but accessible way. We then constructed a modified version of the NS algorithm. The simple modification permits it to remove points in a plateau one by one, without replacement. Once all points in a plateau are removed, the live points are finally replenished. This leads to correct compression in the presence of plateaus and ordinary NS in their absence.
We discussed examples from Schittenhelm & Wacker (2020), shedding light on them by finding the impact of plateaus on ordinary NS in simple analytic formulae. The impact was previously shown only numerically. Lastly, our especially constructed weddingcake problem showed a case with multiple plateaus. The modified NS algorithm successfully dealt with them.
Runs from popular NS software such as PolyChord and MultiNest may be resummed retrospectively via the modified NS algorithm using anesthetic starting from version 2.0.0beta.2. Our modified NS makes a minimal change to ordinary NS, to the extent that we recommend it becomes the canonical version of NS.
Acknowledgements
AF was supported by an NSFC Research Fund for International Young Scientists grant 11950410509. WH was supported by a George Southgate visiting fellowship grant from the University of Adelaide, Gonville & Caius College, and STFC IPS grant number ST/T001054/1. We thank Brendon Brewer for valuable comments and discussion on the manuscript.
Data availability
There is little data associated with this paper, though any data or code will be shared on reasonable request to the corresponding author.
References
 Barbary (2016) Barbary K., 2016, nestle: Pure Python, MITlicensed implementation of nested sampling algorithms for evaluating Bayesian evidence, https://github.com/kbarbary/nestle
 Berger & Wolpert (1988) Berger J., Wolpert R., 1988, The Likelihood Principle, second edn. Lecture notes – monographs series Vol. 6, Institute of Mathematical Statistics
 Brewer & ForemanMackey (2016) Brewer B. J., ForemanMackey D., 2016, arXiv eprints, p. arXiv:1606.03757
 Corsaro & De Ridder (2014) Corsaro E., De Ridder J., 2014, A&A, 571, A71
 Feroz & Hobson (2008) Feroz F., Hobson M. P., 2008, Mon. Not. Roy. Astron. Soc., 384, 449
 Feroz et al. (2009) Feroz F., Hobson M. P., Bridges M., 2009, Mon. Not. Roy. Astron. Soc., 398, 1601
 Feroz et al. (2013) Feroz F., Hobson M. P., Cameron E., Pettitt A. N., 2013, The Open Journal of Astrophysics
 Fowlie et al. (2020) Fowlie A., Handley W., Su L., 2020, Mon. Not. Roy. Astron. Soc., 497, 5256
 Handley (2019) Handley W., 2019, J. Open Source Softw., 4, 1414
 Handley et al. (2015a) Handley W. J., Hobson M. P., Lasenby A. N., 2015a, Mon. Not. Roy. Astron. Soc., 450, L61
 Handley et al. (2015b) Handley W. J., Hobson M. P., Lasenby A. N., 2015b, Mon. Not. Roy. Astron. Soc., 453, 4384
 Henderson & Goggans (2014) Henderson R. W., Goggans P. M., 2014, in Bayesian Inference and Maximum Entropy Methods in Science and Engineering. pp 100–105, doi:10.1063/1.4903717
 Hergt et al. (2020) Hergt L., Agocs F., Handley W., Hobson M., Lasenby A., 2020, (In preparation)
 Higson et al. (2019) Higson E., Handley W., Hobson M., Lasenby A., 2019, Statistics and Computing, 29, 891

Murray (2007)
Murray I., 2007, PhD thesis, Advances in Markov chain Monte Carlo methods,
https://homepages.inf.ed.ac.uk/imurray2/pub/07thesis/  Riley (2019) Riley T., 2019, PhD thesis, Neutron star parameter estimation from a NICER perspective, https://hdl.handle.net/11245.1/aa86fcf324374bc2810ecf9f30a98f7a
 Schittenhelm & Wacker (2020) Schittenhelm D., Wacker P., 2020, arXiv eprints, p. arXiv:2005.08602
 Skilling (2004a) Skilling J., 2004a, Nested sampling example program (version 1.20), http://www.inference.org.uk/bayesys/nest/nest.tar.gz
 Skilling (2004b) Skilling J., 2004b, in Fischer R., Preuss R., Toussaint U. V., eds, American Institute of Physics Conference Series Vol. 735, American Institute of Physics Conference Series. pp 395–405, doi:10.1063/1.1835238
 Skilling (2006) Skilling J., 2006, Bayesian Analysis, 1, 833
 Speagle (2020) Speagle J. S., 2020, MNRAS, 493, 3132
Comments
There are no comments yet.