Nested sampling cross-checks using order statistics

06/05/2020
by   Andrew Fowlie, et al.
University of Cambridge
0

Nested sampling (NS) is an invaluable tool in data analysis in modern astrophysics, cosmology, gravitational wave astronomy and particle physics. We identify a previously unused property of NS related to order statistics: the insertion indexes of new live points into the existing live points should be uniformly distributed. This observation enabled us to create a novel cross-check of single NS runs. The tests can detect when an NS run failed to sample new live points from the constrained prior and plateaus in the likelihood function, which break an assumption of NS and thus leads to unreliable results. We applied our cross-check to NS runs on toy functions with known analytic results in 2 - 50 dimensions, showing that our approach can detect problematic runs on a variety of likelihoods, settings and dimensions. As an example of a realistic application, we cross-checked NS runs performed in the context of cosmological model selection. Since the cross-check is simple, we recommend that it become a mandatory test for every applicable NS run.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

10/26/2020

Nested sampling with plateaus

It was recently emphasised by Riley (2019); Schittenhelm Wacker (202...
10/10/2018

Model Selection of Nested and Non-Nested Item Response Models using Vuong Tests

In this paper, we apply Vuong's (1989) general approach of model selecti...
01/24/2021

Nested Sampling Methods

Nested sampling (NS) computes parameter posterior distributions and make...
03/02/2021

SLD-resolution without occur-check, an example

We prove that the occur-check is not needed for a certain definite claus...
05/18/2020

Nested Sampling And Likelihood Plateaus

The main idea of nested sampling is to substitute the high-dimensional l...
04/16/2018

Diagnostic Tests for Nested Sampling Calculations

Nested sampling is an increasingly popular technique for Bayesian comput...
11/03/2020

Automated Hyperparameter Selection for the PC Algorithm

The PC algorithm infers causal relations using conditional independence ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Nested sampling (NS) was introduced by Skilling in 2004 Skilling (2004, 2006) as a novel algorithm for computing Bayesian evidences and posterior distributions. The algorithm requires few tuning parameters and can cope with traditionally-challenging multimodal and degenerate functions. As a result, popular implementations such as MultiNest Feroz and Hobson (2008); Feroz et al. (2009, 2013), PolyChord Handley et al. (2015, 2015) and dynesty Speagle (2020) have become invaluable tools in modern cosmology Mukherjee et al. (2006); Easther and Peiris (2012); Martin et al. (2014); Hlozek et al. (2015); Audren et al. (2013); Akrami et al. (2018), astrophysics Trotta et al. (2011); Liddle (2007); Buchner et al. (2014), gravitational wave astronomy Veitch et al. (2015); Abbott et al. (2016a, b); Ashton et al. (2019), and particle physics Trotta et al. (2008); Feroz et al. (2008); Buchmueller et al. (2014); Martinez et al. (2017). Other NS applications include statistical physics Bolhuis and Csányi (2018); Martiniani et al. (2014); Pártay et al. (2010, 2014); Baldock et al. (2017); Nielsen (2013), condensed matter physics Baldock et al. (2016), and biology Russel et al. (2018); Johnson et al. (2014).

In this work, we propose a cross-check of an important assumption in NS that works on single NS runs. This improves upon previous tests of NS that required toy functions with known analytic properties Buchner (2016) or multiple runs Higson et al. (2019)

. The cross-check detects faults in the compression of the parameter space that lead to biased estimates of the evidence. We demonstrate our method on toy functions and previous NS runs used for model selection in cosmology 

Handley (2019a). We anticipate that the cross-check could be applied as broadly as NS itself.

The paper is structured as follows. After recapitulating the relevant aspects of NS in section II, we introduce our approach in section III. We apply our methods to toy functions and a cosmological likelihood in section IV. We briefly discuss the possibility of using the insertion indexes to debias NS evidence estimates in section V before concluding in section VI.

Ii NS algorithm

To establish our notation and explain our cross-check, we briefly summarize the NS algorithm. For more detailed and pedagogical introductions, see e.g., Skilling (2006); Feroz et al. (2009); Handley et al. (2015); Speagle (2020). NS is primarily an algorithm for computing the Bayesian evidence of a model in light of data. Consider a model with parameters . The evidence may be written

(1)

where is a prior density for the parameters and

is a likelihood function describing the probability of the observed experimental data. The evidence is a critical ingredient in Bayesian model selection in which models are compared by Bayes factors, since Bayes factors are ratios of evidences for two models,

(2)

The Bayes factor tells us hows much more we should believe in model relative to model in light of experimental data. For an introduction to Bayes factors, see e.g., Kass and Raftery (1995).

NS works by casting eq. 1 as a one-dimensional integral via the volume variable,

(3)

This is the prior volume enclosed within the iso-likelihood contour defined by . The evidence may then be written as

(4)

where in the overloaded notation is the inverse of .

The remaining challenge is computing the one-dimensional integral in eq. 4. In NS we begin from live points drawn from the prior. At each iteration of the NS algorithm, we discard the point with the smallest likelihood, , and sample a replacement drawn from the constrained prior, that is, drawn from subject to . By the statistical properties of random samples drawn from the constrained prior, we expect that the volume compresses by at each iteration, where

(5)

This enables us to estimate the volume at the -th iteration by and write the one-dimensional integral using the trapezium rule,

(6)

The algorithm terminates once an estimate of the maximum remaining evidence, , is less than a specified fraction, , of the total evidence found,

(7)

The main numerical problem in an implementation of NS is efficiently sampling from the constrained prior.

ii.1 Sampling from the constrained prior

Because rejection sampling from the entire prior would be impractically slow as the volume compresses exponentially, implementations of NS typically employ specialised subalgorithms to sample from the constrained prior. When these subalgorithms fail, the evidences may be unreliable. This was considered the most severe drawback of the NS algorithm in Salomone et al. (2018).

One such subalgorithm is ellipsoidal sampling Mukherjee et al. (2006); Feroz and Hobson (2008), a rejection sampling algorithm in which the live points are bounded by a set of ellipsoids. Potential live points are sampled from the ellipsoids and accepted only if . Ellipsoidal NS is implemented in MultiNest Feroz and Hobson (2008); Feroz et al. (2009, 2013). For this to faithfully sample from the constrained prior, the ellipsoids must completely enclose the iso-likelihood contour defined by . To ensure this is the case, the ellipsoids are expanded by a factor , with recommended for reliable evidences.

Slice sampling Neal (2003) is an alternative scheme for sampling from the constrained prior Aitken and Akman (2013); Handley et al. (2015). A chord is drawn from a live point across the entire region enclosed by the iso-likelihood contour and a candidate point is drawn uniformly from along the chord. This is repeated times to reduce correlations between the new point and the original live point. Slice sampling is implemented in PolyChord Handley et al. (2015, 2015). The recommend number of repeats is for a -dimensional function.

ii.2 Plateaus in the likelihood

It was recently discovered in Schittenhelm and Wacker (2020) that plateaus in the likelihood function, i.e., regions in which , can lead to faulty estimates of the compression. In such cases, the live points are not uniformly distributed in (eq. 3), violating assumptions in eq. 5.

Iii Using insertion indexes

efr Analytic Mean SEM Inaccuracy Bias Median p-value Median rolling p-value
Gaussian
Rosenbrock
Shells
Mixture
Table 1: Summary of results of our insertion index cross-check for MultiNest. The numerical results are the average from 100 runs. Biases and inaccuracies greater than and p-values less than are highlighted by red.

By insertion index, we mean the index at which an element must be inserted to maintain order in an sorted list. With a left-sided convention, the insertion index of a sample in an sorted list is such that

(8)

The key idea in this paper is to use the insertion indexes of new live points relative to existing live points sorted by enclosed prior volume, , to detect problems in sampling from the constrained prior. Since the relationship between volume and likelihood is monotonic, we can sort by volume by sorting by likelihood. If new live points are genuinely sampled from the constrained prior leading to a uniform distribution in , the insertion indexes, , should be discrete uniformly distributed from to ,

(9)

This result from order statistics is proven in appendix A. During a NS run of iterations we thus find insertion indexes that should be uniformly distributed. Imagine, however, that during a NS run using ellipsoidal sampling, the ellipsoids encroached on the true iso-likelihood contour. In that case, the insertion indexes near the lowest-likelihood live points could be disfavoured, and the distribution of insertion indexes would deviate from uniformity. Alternatively, imagine that the likelihood function contains a plateau. Any initial live points that lie in the plateau share the same insertion index, leading to many repeated indexes and a strong deviation from a uniform distribution.

Thus, we can perform a statistical test on the insertion indexes to detect deviations from a uniform distribution. The choice of test isn’t important to our general idea of using information in the insertion indexes, though in our examples we use a Kolmogorov-Smirnov (KS) test Smirnov (1948); Kolmogorov (1933), which we found to be powerful. We describe the KS test in appendix B.

Excepting plateaus, deviations from uniformity are caused by a change in the distribution of new live points with respect to the existing live points. Since there is no technical challenge in sampling the initial live points from the prior, failures should typically occur during a run and thus be accompanied by a change in the distribution. In runs with many iterations in which a change occurs only once, the power of the test may be diluted by the many iterations before and after the distribution changes, as the insertion indexes before and after the change should be uniformly distributed. To mitigate this, we perform multiple tests on chunks of iterations and apply a Bonferroni correction for multiple testing. Since the volume compresses by in iterations, we pick as a reasonable size for a chunk of iterations. We later refer to this as the rolling p-value.

We furthermore neglect correlations between the insertion indexes. Finally, we stress that the magnitude of the deviation from uniform, as well as the p-value, should be noted. A small p-value alone isn’t necessarily cause for concern, if the departure from uniformity is negligible.

Iv Examples

Analytic Mean SEM Inaccuracy Bias Median p-value Median rolling p-value
Gaussian
Rosenbrock
Shells
Mixture
Table 2: Summary of results of our insertion index cross-check for PolyChord. See table 1 for further details. In this table we show , which may be thought of as a “PolyChord efficiency” analogue of the MultiNest efficiency efr.

iv.1 Toy functions

We now present detailed numerical examples of our cross-check using NS runs on toy functions using MultiNest-3.12 Feroz and Hobson (2008); Feroz et al. (2009, 2013) and PolyChord-1.17.1 Handley et al. (2015, 2015). We chose toy functions with known analytic evidences or precisely known numerical estimates of the evidence to demonstrate that biased results from NS are detectable with our approach. The toy functions are described in appendix C.

We performed 100 MultiNest and PolyChord runs on each toy function to study the statistical properties of their outputs. We used and throughout. To generate biased NS runs, we used inappropriate settings, e.g., in MultiNest or few repeats in slice sampling in PolyChord, and difficult toy functions with . We post-processed the results using anesthetic Handley (2019b).

We summarise our results by the average and error estimate , and by the median p-value from all the insertion indexes and the median running p

-value. We furthermore report the standard error on the mean, SEM

, and the standard deviation,

. We use the error estimates to compute the average inaccuracy and bias,

(10)
(11)

The inaccuracy shows whether the uncertainty reported by a code from single runs was reasonable.

We present our numerical results using MultiNest and PolyChord in tables 1 and 2, respectively. First, for the Gaussian function, the MultiNest estimates of were significantly biased for and for all efr settings, and for and for . Our cross-check was successful, as the p-values corresponding to the biased results were tiny.

For the Rosenbrock function, our cross-check detected a problem with MultiNest runs with and , even though the MultiNest evidence estimate was not biased. It did not detect a problem with , even though the estimate was biased. This was, however, the only problem for which this occurred for MultiNest.

For the shells function, the MultiNest estimates of were biased for many combinations of and efr. The biased results were all identified by our cross-check with tiny p-values. Indeed, when , even with , we saw a bias of about and a median rolling p-value of about .

Lastly, the mixture functions are particularly important, as MultiNest was known to produce biased results even with . Using all the insertion indexes, we find for this function, i.e., our cross-check successfully detects these failures.

In the analogous results for PolyChord in table 2 we see fewer significantly biased estimates throughout, and only three biased results when using the recommended setting, which all occurred in the mixture function. We note, though, that the error estimates from PolyChord were reasonable even in these cases. The most extremely biased results were detected by our cross-check in the Gaussian, shells and mixture functions.

Our cross-check detected faults in the shells function for , despite no evidence of bias in PolyChord results. The p-values, however, increased monotonically as was increased, as expected. Lastly, we note that in many more cases than for MultiNest biases were not detected by our cross-check; this may be because the biases are smaller than they were for MultiNest.

In summary, for both MultiNest and PolyChord, we find that our cross-check can detect problematic NS runs in a variety of functions, settings and dimensions, although there is room for refinement. The problem detected by our cross-check usually leads to a faulty estimate of the evidence, though in a few cases the evidence estimate remains reasonable despite the apparent failure to sample correctly from the constrained prior.

iv.2 Cosmological model selection

In Handley (2019a), Handley considered the Bayesian evidence for a spatially closed Universe. Bayesian evidences from combinations of four datasets were computed using PolyChord

for a spatially flat Universe and a curved Universe. The resulting Bayes factors showed that a closed Universe was favoured by odds of about

for a particular set of data. There were NS computations in total. The PolyChord results are publicly archived at Handley (2019c). We ran our cross-check on each of the NS runs in the archived data, finding p-values in the range to . The results do not suggest problems with the NS runs. The p-value of is not particularly alarming, especially considering that we conducted tests. The full results are shown in table 3.

Flat Curved
Data p-value Rolling p-value p-value Rolling p-value
BAO 0.89 0.82 0.07 0.05
lensing+BAO 0.72 0.54 0.19 0.43
lensing 0.26 0.14 0.04 0.64
lensing+SES 0.08 0.08 0.78 0.04
Planck+BAO 0.39 0.56 0.14 0.43
Planck+lensing+BAO 0.68 0.69 0.70 0.27
Planck+lensing 0.94 0.49 0.89 0.72
Planck+lensing+SES 0.92 0.92 0.33 0.82
Planck 0.81 0.69 0.84 0.88
Planck+SES 0.20 0.48 0.92 0.97
SES 0.59 0.59 0.98 0.98
Table 3: Insertion index cross-check applied to NS results from cosmological model selection in Handley (2019a). We show p-values and rolling p-values for the NS evidence calculations for flat and curved Universe models with 11 datasets. See Handley (2019a) for further description of the datasets and models.

iv.3 Plateaus

Let us consider the one-dimensional function in example 2 from Schittenhelm and Wacker (2020). The likelihood function is defined piece-wise to be a Gaussian at the center and zero in the tails;

(12)

for and . The prior is uniform from to . We confirm that the NS algorithm produces biased estimates of the evidence in this function. However, since the likelihood is zero in of the prior, approximately of the initial live points have a likelihood of zero and share the same insertion index from eq. 8. This results in a tiny in our test.

iv.4 Perfect NS

Lastly, we simulated perfect NS runs that correctly sample from the constrained prior. We simulated them by directly sampling compression factors from uniform distributions and never computing any likelihoods. Of course, with no likelihood we cannot compute an evidence, but we can simulate insertion indexes. We performed 10,000 runs of perfect NS with 10,000 iterations and computed the p-value via our KS test.

Figure 1: Histogram of p-values from tests uniformity of insertion indexes from perfect NS (blue), samples from a discrete uniform distribution (orange) and samples from a continuous uniform distribution (green).

We furthermore computed 100,000 p-values from a KS test on 10,000 samples drawn from a continuous uniform distribution and on 10,000 samples drawn from a discrete uniform distribution with 1000 bins. We histogram all p-values in fig. 1. Of course, the KS p-values should be uniformly distributed in the continuous case and it appears that it is (green). The impact of discretization on the KS test is visible (orange) but small with 1000 live points. The further impact of correlations amongst the samples in perfect NS (blue) isn’t obvious. This suggests that although the correlations and discretization impact the KS test, the effect is small.

V Future use of insertion indexes

For the purposes of evidence estimation, a nested sampling run is fully encoded by recording the birth contour and death contour of each point Higson et al. (2018). For the purposes of estimating volume in a statistical way, we generally discard the likelihood information, focussing on the ordering of the contours. This makes sense, as barring the stopping criterion in eq. 7, the underlying nested sampling algorithm is athermal and insensitive to monotonic transformations of the likelihood.

Traditional nested sampling uses the fact that

(13)

In the above, one has essentially marginalised out dependency on everything other than

, and compressed the birth-death contour information into a vector encoding the number of live points at each iteration

. One can then use this recursively (alongside the fact that ) to perform inference on and therefore the evidence via eqs. 6 and 5.

The critical question therefore is whether this “Skilling compression” from birth-death contours to numbers of live points is lossless or lossy for the purposes of volume estimation (note that it is generically lossy, as it’s impossible to go in the reverse direction). The results presented in this paper are suggestive that it is losing some useful information, as insertion indexes do provide further information in the context of a cross check (and are in fact a lossless compression of the birth and death contours). One possibility is that the Skilling compression is lossless in the context of perfect nested sampling, but if a run is biased then you may be able to use insertion indexes to partially correct a biased run.

Vi Conclusions

We identified a previously unknown property of the NS algorithm: the insertion indexes of new live points into the existing live points should be uniformly distributed. This observation enabled us to invent a cross-check of single NS runs. The cross-check can detect when an NS run fails to sample new live points from the constrained prior, which is the most challenging aspect of an efficient implementation of NS, and functions with plateaus in the likelihood function recently identified in Schittenhelm and Wacker (2020), both of which can lead to unreliable estimates of the evidence and posterior,

We applied our cross-check to NS runs on several toy functions with known analytic results in dimensions with MultiNest and PolyChord, which sample from the constrained prior using ellipsoidal rejection sampling and slice sampling, respectively. Our numerical results are some of the most detailed checks of MultiNest and PolyChord. We found that our cross-check could detect problematic runs for both codes. Since the idea is relatively simple, we suggest that a cross-check of this kind should become a mandatory test of any NS run. The exact form of the cross-check, however, could be refined. We chose a KS test using all the iterations or the most significant iterations; both choices could be improved. As an example of a realistic application, we furthermore applied our cross-check to results from NS runs performed in the context of cosmological model selection.

Lastly, we speculated that the information contained in the insertion indexes could be used to debias single NS runs or lead to an improved formula for the evidence summation. We outlined a few difficulties and hope our observations lead to further developments.

Future work will involve extending the method to work in the context of a variable number of live points, as well as exploring the larger possibilities of using order statistics to improve NS accuracy and potentially debias runs.

Acknowledgements.
The authors would like to thank Gregory Martinez for valuable discussions. We thank the organisers of the GAMBIT XI workshop where some of this work was planned and completed. AF was supported by an NSFC Research Fund for International Young Scientists grant 11950410509. WH was supported by a George Southgate visiting fellowship grant from the University of Adelaide, and STFC IPS grant number G102229.

Appendix A Proof of eq. 9

In NS we have remaining samples after the worst live point was removed. Their associated volumes were drawn from a (continuous) uniform distribution, . If we draw another sample, the distribution of its insertion index, , relative to the other samples depends on the probability contained in the uniform distribution between the ordered samples. In fact, the probability for each insertion index is

(14)
(15)
(16)

where we completed two trivial integrals and wrote the terms as expectations. To compute the expectations, note that

(17)

since we need samples above , samples below and one sample at . The first factor is combinatoric; the second accounts for the samples that must lie above ; and the third accounts for the samples that must lie below . The factor for a final sample at is just one. By integration, we quickly find , and thus . That is, the insertion indexes follow a discrete uniform distribution.

Note that this didn’t depend especially on the fact that the distribution of the samples was uniform. If the samples had followed a different distribution, we can transform where

is the cumulative distribution function, such that

, the proof goes through just the same.

Appendix B Kolmogorov-Smirnov test

We use a one-sample Kolmogorov-Smirnov (KS) test Kolmogorov (1933); Smirnov (1948) to compare our set of

insertion indexes with a (discrete) uniform distribution. First, we compute the KS test-statistic by comparing the empirical cumulative distribution function,

, to that from a discrete uniform distribution, ,

(18)

This provides a notion of distance between the observed indexes and a uniform distribution. In the continuous case, the null-distribution of this test-statistic does not depend on the reference distribution. We convert the test-statistic into a p-value using an asymptotic approximation of the Kolmogorov distribution Marsaglia et al. (2003) implemented in scipy Virtanen et al. (2020),

(19)

where is the observed statistic. This assumes that we are testing samples from a continuous distribution. In our discrete case, the p-values from the Kolmogorov distribution are known to be conservative Arnold and Emerson (2011).

Appendix C Toy functions

c.1 Gaussian

Our first example is a multi-dimensional Gaussian likelihood,

(20)

with covariance matrix and mean . We pick a uniform prior from to for each dimension. The analytic evidence is always since the likelihood is a pdf in , modulo small errors as the infinite domain is truncated by the prior. We pick and a diagonal covariance matrix with for each dimension.

c.2 Rosenbrock

This is a two-dimensional function exhibiting a pronounced curved degeneracy Rosenbrock (1960). The likelihood function is

(21)

We consider uniform priors from to for each parameter. The evidence can be found semi-analytically from a one-dimensional integral,

(22)

to be . The analytic approximation, which approximates the domain of integration by the whole real line, leads to

(23)

and thus .

c.3 Gaussian shells

The multidimensional likelihood is

(24)

where the shell function is a Gaussian favouring a radial distance from the point ,

(25)

Thus, the highest likelihood region forms a shell of characteristic width at the surface of a -sphere of radius . Our likelihood contains two such shells, one at and one at . As usual, we take , and .

With uniform priors between and , the analytic evidence is approximately,

(26)

where is the surface area of an -sphere and is the

-th non-central moment of a Gaussian,

, and we ignore the truncation of the domain by the finite-sized hypercube.

c.4 Gaussian-Log-Gamma mixture

This toy function was found in Beaujean and Caldwell (2013); Feroz et al. (2013); Buchner (2016) to be problematic in MultiNest without importance sampling. It is defined in even numbers of dimensions. The likelihood is a product of factors,

(27)

where the factors are

(28)

where e.g., denotes a one-dimensional log-Gamma density for with mean and shape parameters and . There are four identical modes at , and .

The prior is uniform in each parameter from to . Since the likelihood is a pdf in , the analytic is governed by the prior normalization factor, , modulo small truncation errors introduced by the prior.

References

  • Skilling (2004) J. Skilling, Nested Sampling, in American Institute of Physics Conference Series, Vol. 735, edited by R. Fischer, R. Preuss, and U. V. Toussaint (2004) pp. 395–405.
  • Skilling (2006) J. Skilling, Nested sampling for general Bayesian computation, Bayesian Analysis 1, 833 (2006).
  • Feroz and Hobson (2008) F. Feroz and M. P. Hobson, Multimodal nested sampling: an efficient and robust alternative to MCMC methods for astronomical data analysis, Mon. Not. Roy. Astron. Soc. 384, 449 (2008)arXiv:0704.3704 [astro-ph] .
  • Feroz et al. (2009)

    F. Feroz, M. P. Hobson, and M. Bridges, MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics, 

    Mon. Not. Roy. Astron. Soc. 398, 1601 (2009)arXiv:0809.3437 [astro-ph] .
  • Feroz et al. (2013) F. Feroz, M. P. Hobson, E. Cameron, and A. N. Pettitt, Importance Nested Sampling and the MultiNest Algorithm, The Open Journal of Astrophysics  (2013)arXiv:1306.2144 [astro-ph.IM] .
  • Handley et al. (2015) W. J. Handley, M. P. Hobson, and A. N. Lasenby, PolyChord: nested sampling for cosmology, Mon. Not. Roy. Astron. Soc. 450, L61 (2015)arXiv:1502.01856 [astro-ph.CO] .
  • Handley et al. (2015) W. J. Handley, M. P. Hobson, and A. N. Lasenby, PolyChord: next-generation nested sampling, Mon. Not. Roy. Astron. Soc. 453, 4384 (2015)arXiv:1506.00171 [astro-ph.IM] .
  • Speagle (2020) J. S. Speagle, dynesty: A Dynamic Nested Sampling Package for Estimating Bayesian Posteriors and Evidences, Mon. Not. Roy. Astron. Soc.  (2020)arXiv:1904.02180 [astro-ph.IM] .
  • Mukherjee et al. (2006) P. Mukherjee, D. Parkinson, and A. R. Liddle, A nested sampling algorithm for cosmological model selection, Astrophys. J. 638, L51 (2006)arXiv:astro-ph/0508461 .
  • Easther and Peiris (2012) R. Easther and H. V. Peiris, Bayesian Analysis of Inflation II: Model Selection and Constraints on Reheating, Phys. Rev. D 85, 103533 (2012)arXiv:1112.0326 [astro-ph.CO] .
  • Martin et al. (2014) J. Martin, C. Ringeval, R. Trotta, and V. Vennin, The Best Inflationary Models After Planck, JCAP 03, 039arXiv:1312.3529 [astro-ph.CO] .
  • Hlozek et al. (2015) R. Hlozek, D. Grin, D. J. E. Marsh, and P. G. Ferreira, A search for ultralight axions using precision cosmological data, Phys. Rev. D 91, 103512 (2015)arXiv:1410.2896 [astro-ph.CO] .
  • Audren et al. (2013) B. Audren, J. Lesgourgues, K. Benabed, and S. Prunet, Conservative constraints on early cosmology with MONTE PYTHON, J. Cosmology Astropart. Phys 2013, 001 (2013)arXiv:1210.7183 [astro-ph.CO] .
  • Akrami et al. (2018) Y. Akrami et al. (Planck), Planck 2018 results. X. Constraints on inflation,  (2018), arXiv:1807.06211 [astro-ph.CO] .
  • Trotta et al. (2011) R. Trotta, G. Jóhannesson, I. V. Moskalenko, T. A. Porter, R. R. de Austri, and A. W. Strong, Constraints on cosmic-ray propagation models from a global Bayesian analysis, Astrophys. J. 729, 106 (2011)arXiv:1011.0037 [astro-ph.HE] .
  • Liddle (2007) A. R. Liddle, Information criteria for astrophysical model selection, MNRAS 377, L74 (2007)arXiv:astro-ph/0701113 [astro-ph] .
  • Buchner et al. (2014) J. Buchner, A. Georgakakis, K. Nandra, L. Hsu, C. Rangel, M. Brightman, A. Merloni, M. Salvato, J. Donley, and D. Kocevski, X-ray spectral modelling of the AGN obscuring region in the CDFS: Bayesian model selection and catalogue, Astron. Astrophys. 564, A125 (2014)arXiv:1402.0004 [astro-ph.HE] .
  • Veitch et al. (2015) J. Veitch et al., Parameter estimation for compact binaries with ground-based gravitational-wave observations using the LALInference software library, Phys. Rev. D91, 042003 (2015)arXiv:1409.7215 [gr-qc] .
  • Abbott et al. (2016a) B. P. Abbott et al. (LIGO Scientific, Virgo), Tests of general relativity with GW150914, Phys. Rev. Lett. 116, 221101 (2016a), [Erratum: Phys. Rev. Lett.121,no.12,129902(2018)], arXiv:1602.03841 [gr-qc] .
  • Abbott et al. (2016b) B. P. Abbott et al. (LIGO Scientific, Virgo), Binary Black Hole Mergers in the first Advanced LIGO Observing Run, Phys. Rev. X6, 041015 (2016b), [erratum: Phys. Rev.X8,no.3,039903(2018)], arXiv:1606.04856 [gr-qc] .
  • Ashton et al. (2019) G. Ashton et al., BILBY: A user-friendly Bayesian inference library for gravitational-wave astronomy, Astrophys. J. Suppl. 241, 27 (2019)arXiv:1811.02042 [astro-ph.IM] .
  • Trotta et al. (2008) R. Trotta, F. Feroz, M. P. Hobson, L. Roszkowski, and R. Ruiz de Austri, The Impact of priors and observables on parameter inferences in the Constrained MSSM, JHEP 12, 024arXiv:0809.3792 [hep-ph] .
  • Feroz et al. (2008) F. Feroz, B. C. Allanach, M. Hobson, S. S. AbdusSalam, R. Trotta, and A. M. Weber, Bayesian Selection of sign(mu) within mSUGRA in Global Fits Including WMAP5 Results, JHEP 10, 064arXiv:0807.4512 [hep-ph] .
  • Buchmueller et al. (2014) O. Buchmueller et al., The CMSSM and NUHM1 after LHC Run 1, Eur. Phys. J. C 74, 2922 (2014)arXiv:1312.5250 [hep-ph] .
  • Martinez et al. (2017) G. D. Martinez, J. McKay, B. Farmer, P. Scott, E. Roebber, A. Putze, and J. Conrad (GAMBIT), Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module, Eur. Phys. J. C77, 761 (2017)arXiv:1705.07959 [hep-ph] .
  • Bolhuis and Csányi (2018) P. G. Bolhuis and G. Csányi, Nested transition path sampling, Phys. Rev. Lett. 120, 250601 (2018).
  • Martiniani et al. (2014) S. Martiniani, J. D. Stevenson, D. J. Wales, and D. Frenkel, Superposition enhanced nested sampling, Phys. Rev. X 4, 031034 (2014).
  • Pártay et al. (2010) L. B. Pártay, A. P. Bartók, and G. Csányi, Efficient sampling of atomic configurational spaces, The Journal of Physical Chemistry B 114, 10502 (2010), pMID: 20701382.
  • Pártay et al. (2014) L. B. Pártay, A. P. Bartók, and G. Csányi, Nested sampling for materials: The case of hard spheres, Phys. Rev. E 89, 022302 (2014).
  • Baldock et al. (2017) R. J. N. Baldock, N. Bernstein, K. M. Salerno, L. B. Pártay, and G. Csányi, Constant-pressure nested sampling with atomistic dynamics, Phys. Rev. E 96, 043311 (2017).
  • Nielsen (2013) S. O. Nielsen, Nested sampling in the canonical ensemble: Direct calculation of the partition function from nvt trajectories, The Journal of Chemical Physics 139, 124104 (2013).
  • Baldock et al. (2016) R. J. N. Baldock, L. B. Pártay, A. P. Bartók, M. C. Payne, and G. Csányi, Determining pressure-temperature phase diagrams of materials, Phys. Rev. B 93, 174108 (2016).
  • Russel et al. (2018) P. M. Russel, B. J. Brewer, S. Klaere, and R. R. Bouckaert, Model Selection and Parameter Inference in Phylogenetics Using Nested Sampling, Systematic Biology 68, 219 (2018).
  • Johnson et al. (2014) R. Johnson, P. Kirk, and M. P. H. Stumpf, SYSBIONS: nested sampling for systems biology, Bioinformatics 31, 604 (2014).
  • Buchner (2016) J. Buchner, A statistical test for Nested Sampling algorithms, Statistics and Computing 26, 383 (2016)arXiv:1407.5459 [stat.CO] .
  • Higson et al. (2019) E. Higson, W. Handley, M. Hobson, and A. Lasenby, Nestcheck: diagnostic tests for nested sampling calculations, Mon. Not. Roy. Astron. Soc. 483, 2044 (2019)arXiv:1804.06406 [stat.CO] .
  • Handley (2019a) W. Handley, Curvature tension: evidence for a closed universe,  (2019a), arXiv:1908.09139 [astro-ph.CO] .
  • Kass and Raftery (1995) R. E. Kass and A. E. Raftery, Bayes Factors, J. Am. Statist. Assoc. 90, 773 (1995).
  • Salomone et al. (2018) R. Salomone, L. F. South, C. C. Drovandi, and D. P. Kroese, Unbiased and Consistent Nested Sampling via Sequential Monte Carlo,  (2018), arXiv:1805.03924 [stat.CO] .
  • Neal (2003) R. M. Neal, Slice sampling, Ann. Statist. 31, 705 (2003)arXiv:physics/0009028 [physics.data-an] .
  • Aitken and Akman (2013) S. Aitken and O. E. Akman, Nested sampling for parameter inference in systems biology: application to an exemplar circadian model, BMC Systems Biology 7, 72 (2013).
  • Schittenhelm and Wacker (2020) D. Schittenhelm and P. Wacker, Nested Sampling And Likelihood Plateaus, arXiv e-prints , arXiv:2005.08602 (2020), arXiv:2005.08602 [math.ST] .
  • Smirnov (1948) N. Smirnov, Table for estimating the goodness of fit of empirical distributions, Ann. Math. Statist. 19, 279 (1948).
  • Kolmogorov (1933) A. Kolmogorov, Sulla determinazione empírica di uma legge di distribuzione, Giornale dell’ Instuto Italiano degli Attuari 4, 83 (1933).
  • Handley (2019b) W. Handley, anesthetic: nested sampling visualisation, The Journal of Open Source Software 4, 1414 (2019b).
  • Handley (2019c) W. Handley, Curvature tension: evidence for a closed universe (supplementary inference products), 10.5281/zenodo.3371152 (2019c).
  • Higson et al. (2018) E. Higson, W. Handley, M. Hobson, A. Lasenby, et al., Sampling errors in nested sampling parameter estimation, Bayesian Analysis 13, 873 (2018).
  • Marsaglia et al. (2003) G. Marsaglia, W. W. Tsang, and J. Wang, Evaluating Kolmogorov’s Distribution, Journal of Statistical Software, Articles 8, 1 (2003).
  • Virtanen et al. (2020) P. Virtanen et al., SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nature Methods  (2020).
  • Arnold and Emerson (2011) T. B. Arnold and J. W. Emerson, Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions, The R Journal 3, 34 (2011).
  • Rosenbrock (1960) H. H. Rosenbrock, An Automatic Method for Finding the Greatest or Least Value of a Function, The Computer Journal 3, 175 (1960).
  • Beaujean and Caldwell (2013) F. Beaujean and A. Caldwell, Initializing adaptive importance sampling with Markov chains,  (2013), arXiv:1304.7808 [stat.CO] .