Solving high-dimensional parameter inference: marginal posterior densities Moment Networks

by   Niall Jeffrey, et al.

High-dimensional probability density estimation for inference suffers from the "curse of dimensionality". For many physical inference problems, the full posterior distribution is unwieldy and seldom used in practice. Instead, we propose direct estimation of lower-dimensional marginal distributions, bypassing high-dimensional density estimation or high-dimensional Markov chain Monte Carlo (MCMC) sampling. By evaluating the two-dimensional marginal posteriors we can unveil the full-dimensional parameter covariance structure. We additionally propose constructing a simple hierarchy of fast neural regression models, called Moment Networks, that compute increasing moments of any desired lower-dimensional marginal posterior density; these reproduce exact results from analytic posteriors and those obtained from Masked Autoregressive Flows. We demonstrate marginal posterior density estimation using high-dimensional LIGO-like gravitational wave time series and describe applications for problems of fundamental cosmology.



There are no comments yet.


page 1

page 2

page 3

page 4


Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference

An important feature of Bayesian statistics is the possibility to do seq...

High-Dimensional Probability Estimation with Deep Density Models

One of the fundamental problems in machine learning is the estimation of...

Tensor-Train Density Estimation

Estimation of probability density function from samples is one of the ce...

Symmetry-Aware Marginal Density Estimation

The Rao-Blackwell theorem is utilized to analyze and improve the scalabi...

Variable Skipping for Autoregressive Range Density Estimation

Deep autoregressive models compute point likelihood estimates of individ...

Automating Inference of Binary Microlensing Events with Neural Density Estimation

Automated inference of binary microlensing events with traditional sampl...

Cascaded High Dimensional Histograms: A Generative Approach to Density Estimation

We present tree- and list- structured density estimation methods for hig...

Code Repositories


Demonstration of MomentNetworks for high-dimensional probability density estimation (LFI)

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Estimating the posterior probability density

of a set of parameters given some observed data is often the primary objective of problems of inference, prediction, or generation. The object encapsulates all belief and uncertainties about the unknown quantities . With this aim in mind, recent advances in neural density estimation have improved our ability to estimate the density from a set of training examples .

Estimating such probability densities with neural density methods, such as Mixture Density Networks Bishop (1994), or recent state-of-the-art normalizing flow methods, such as Masked Autogregressive Flows (MAF Papamakarios et al. (2017)), provide an excellent way to quantify uncertainty for predicted or inferred parameters and signals . Used for likelihood-free inference (also known as simulation-based inference Brehmer et al. (2020); Cranmer et al. (2020)) these density estimation methods can estimate conditional probability densities for parameters and data, either the posterior or the likelihood  Papamakarios and Murray (2016); Alsing et al. (2019).

For high-dimensional signals, estimation of the full joint density is often not useful and, instead, summaries of lower-dimensional marginal densities are the final goal. For example, the marginal posterior density per pixel, or subsets of pixels, could serve to quantify uncertainty in a reconstructed image.

In this example, the joint posterior marginal for pairs of pixel parameters given some observed data


would marginalize over all possible values of all other parameters (i.e. the other pixels and latent parameters) . If this were evaluated for all pairs of parameters, all 2D marginal moments of the high-dimensional posterior distribution would be characterized.

In this contribution we present two complementary approaches to evaluate the two-dimensional marginal posterior distributions, marginal flows and Moment Networks (Sec. 2). In Sec. 3 we demonstrate the two methods in comparison to a known underlying posterior density (sampled with MCMC), and show a simulated gravitational wave data model, where the underlying time-ordered signal values form the high-dimensional parameter space to be inferred. In Sec. 4 we describe seemingly intractable problems in cosmological inference that can be solved using marginal posterior density estimation.

2 Marginal posterior density estimation


In practice, for many physical inference problems with high-dimensional parameter spaces, the full high-dimensional posterior distribution is not necessary or even interpretable. Instead, the inference goal are the marginal one- and two-dimensional posterior distributions of the parameters (e.g. PlanckCollaboration, 2020; Joudaki and KiDS Collaboration, 2018; Abbott et al., 2018).

Even if full posterior sampling is possible through sophisticated MCMC techniques, e.g. in hierarchical models with known distributions (rather than in a likelihood-free framework), the number of posterior samples needed to compute a -dimensional marginal grows exponentially with . For high-dimensional problems, the limited number of independent samples that result in practice only allow for the computation of low-dimensional marginals of the posterior density.

We therefore make the marginal densities the target of our inference problem. We take inspiration from the simplicity of integration when a Monte Carlo sampled representation of the posterior distribution is available. In this case, marginalization is trivial: it amounts simply to ignoring the parameter dimensions to be marginalized over. In this work we show that we can directly bring this powerful notion to simulation-based inference; it allows us to estimate the marginal posterior density (or its moments) directly. This is a powerful approach whenever we are dealing with a large number parameters and effectively removes the practical limitation of simulation-based inference techniques to applications with a low-dimensional parameter space.

Figure 1: 100-dimensional data model with known reference distribution evaluated with MCMC samples. Direct 2D marginal posterior estimation using a MAF ensemble (left panel) and representation of 2D Moment Network result (right panel) both trained with simulations.

Marginal flows

Many popular and powerful density estimation methods can be categorized as normalizing flows. These use a series of bijective functions to transform from simple known densities (e.g. unit normal) to the target density Jimenez Rezende and Mohamed (2015); Kingma et al. (2016). MAFs represent the estimated density as a transformation of a unit normal through a series of autoregressive functions Papamakarios et al. (2017, 2019). The networks are trained to give an estimate of the target distribution

by minimizing a Monte Carlo estimate of the Kullback-Leibler divergence 

Kullback and Leibler (1951). For a sampling distribution this would be


with varying network parameters over the forward-modelled mock data . In this same likelihood-free framework, one can directly estimate the posterior distributions for subsets of the large parameter set. For any two parameters and of the full , one can directly estimate by minimizing


The resulting density will indeed be an estimate of the marginal posterior for the chosen parameter pair (eq. 1) if all parameters of (not just the chosen pair) are drawn from the prior to generate the training data. This procedure also avoids the need for data compression steps Charnock et al. (2018); Alsing and Wandelt (2018)

, as we condition on high-dimensional data

rather than estimating its density, and any nuisance parameters are automatically marginalized away, provided they have been sufficiently sampled in the training data.

Figure 2: Two example simulated gravitation wave time series signals for the strain “” polarization with realistic LIGO-like noise. The dashed line shows the true strain values over time.

Moment Networks

In practice, posterior estimates often serve principally to compute posterior moments. Moment Networks allow us to side-step the problem of estimating the posterior density and directly skip to estimation of location, scale, and covariance of the parameters (and possibly higher-order moments). When this is sufficient, Moment Networks allow the use of far simpler neural network architectures, which reduces risk of training failure, and boosts inference speed.

We begin by noting that if we find some function of our data that minimizes an loss over the distribution of possible training examples ,


then , which we represent as a neural network, evaluated for the observed data is the mean of the posterior distribution . It is therefore possible to create a hierarchy of networks to generate further moments of the posterior distribution. For example, the function that minimizes


for fixed, already trained , is such that

, is the set of posterior variances  

Jaynes (2003); Adler and Öktem (2018). The objective functions for marginal posterior parameter covariances can be similarly constructed.

By sampling the full parameter space from the prior distributions , the functions and can be combined to output the posterior means, variances, and covariances for subsets of the full set of parameters; the marginalization over other parameters is implicitly done during training. This result is exact and independent of the true underlying posterior or prior distributions.

The Moment Network solves for the marginal posterior moments by construction, and therefore does not suffer the problem of mode collapse in variational inference with multi-modal posteriors, which can lead to underpredicted uncertainty. Outside the likelihood-free framework, if one does have information about the functional form of the posterior, one can fit the posterior parameters to the marginal moments (see Sec. 4).

3 Experiments

High-dimensional inference

: We use a 100-dimensional parameter inference toy model to demonstrate marginal posterior estimation for pairs of parameters. The model consists of 100-element data vectors with non-stationary Gaussian noise and a Gaussian prior distribution with non-trivial covariance introducing parameter correlation. We estimate the marginal posterior density for parameter pairs using MAF (using the

pyDELFI package Alsing et al. (2019)) and estimate the 2D marginal mean, variance and covariance using Moment Networks. The results are represented in Fig. 1.

As a reference, we can directly sample the posterior distribution using high-dimensional MCMC, which took draws from the likelihood. The normalizing flow result would have been intractable if the density estimation target was with respect to the full parameter space or to the data space. With a marginal flow, changing the target to the pairs of parameters, it is simple (with a basic 2-GPU:12GB set-up) to evaluate all marginal posterior pairs (often represented as a so-called “corner plot”).

With the same set-up, the Moment Network hierarchy was able to accurately evaluate the means, variances and covariances of the marginals (see Fig. 1) with a few seconds of training and evaluation in

s without requiring any sampling or grid evaluation. For many practical applications of inference in the physical sciences, these marginal joint moments would be the final goal.

Gravitational wave signal demonstration: Fig. 2 shows two example simulated gravitational wave time series. The two signals (dashed orange) are s intervals from the s before a binary black hole merger using the SEOBNRv4 model Bohé et al. (2017).

We have simplified the problem for this demonstration by removing all “geometric” effects (black hole spin, inclination, detector geometry) and use only the polarization for the detector strain . We do, however, sample the merger events with an independent prior distribution for each mass and distance . The noise is LIGO-like noise, which, along with the signal, was generated using the pyCBC package for 35000 simulations.

With the time series elements forming a high-dimensional parameter space, the left panel of Fig. 3

shows a representation of the marginal posterior standard deviation for each of 128 parameters for a simulated data set. This result was evaluated using the trained Moment Network. The

right panel shows a validation case of similar complexity to the gravitational wave model, but with known likelihood. The Moment Network trained on simulations matches accurately with a long-run MCMC chain. This validates our approach.

Figure 3: Left panel: Moment Network (MN) estimate of the 1- standard deviation per strain parameter given 62.5ms of data (Fig. 2). Right panel: For each of 64 parameters (c.f. time step), contours are the marginal posterior from MN (shaded orange) and MCMC (dashed green).

4 Discussion & cosmological applications

Though the direct density estimation of marginal posteriors is much more robust than the estimation of the full posterior, it may still suffer from well-known issues of density estimation. Moment Networks optimize a completely different set of objective functions to return estimates of the posterior moments. This affords an opportunity to cross-validate, as moments of the estimated marginal posteriors should match those from the Moment Network. If the results are inconsistent for an initial set of simulations, then there may be insufficient network complexity (the network complexity for both methods scales similarly) or insufficient number of simulations.

Thus far, density estimation likelihood-free inference in cosmology has generally been limited to a few parameters (e.g. Alsing et al., 2019; Taylor et al., 2019; Brehmer et al., 2019; Jeffrey et al., 2020; Ramanah et al., 2020; Lemos et al., 2020). Though simulation-based inference of cosmological fields (including dark matter) can be integral to many analyses (e.g. Caldeira et al., 2019; Shirasaki et al., 2019; Jeffrey et al., 2020; Petroff et al., 2020), it can be intractable to estimate the full posterior due to high-dimensionality. With the approach we propose, joint marginal posteriors (and associated moments) for reconstructed cosmological fields can be directly evaluated.

For cases where it is possible to sample, marginal flows and Moment Networks still provide advantages. One particularly ambitious cosmological sampler BORG Jasche and Wandelt (2013); Jasche et al. (2015) samples the dimensional posterior density of the initial conditions of the Universe for given galaxy data, using a non-linear forward model including the physics and data effects. Though the full posterior is sampled, the complexity of the sampler and the inherently sequential nature of MCMC limits the number of independent samples to ; sufficient only to estimate low-dimensional marginal posteriors and their moments. The approach proposed in this work could use similar computational resources to generate simulations to train marginal flows and Moment Networks in parallel (rather than sequentially) and efficiently output low-dimensional marginal posteriors and moments.

Beyond general high-dimensional inference, the principal motivation for this work (Sec. 2), we plan to explore a wide range of further applications of marginal flows and Moment Networks to probe the fundamental physics of the Universe in future studies.

Demonstration code can be found at:

Broader Impact

This work provides a robust approach to quantify uncertainty from high-dimensional parameter spaces by estimating marginal posterior distributions or their associated moments directly. This has immediate application for parameter and model inference in astrophysics and cosmology, and the physical sciences more generally. We note that if the method is misunderstood or misapplied, incorrect uncertainty quantification or risk analysis would follow.

To mitigate this, diagnostic and validation methods can be applied (e.g. ensembles of neural density estimators or quantile tests) or, as proposed in this work, by comparing results between likelihood-free methods (e.g. marginal flows and Moment Networks).

The approach in this work can be applied to signal inference and prediction more generally (including fast image analysis, time series prediction, forecasting, and quantifying uncertainty for decision making).


Software used

pyDELFI ( for MAF density estimation implementation Alsing et al. (2019); chainconsumer ( for Fig. 1 Hinton (2016); emcee ( for MCMC sampling Foreman-Mackey et al. (2013); pyCBC ( for gravitational wave data simulation Nitz et al. (2020).

The authors thank Tom Charnock for useful discussions. NJ acknowledges funding from the École Normale Supérieure (ENS). BDW acknowledges support by the ANR BIG4 project, grant ANR-16-CE23-0002 of the French Agence Nationale de la Recherche; and the Labex ILP (reference ANR-10-LABX-63) part of the Idex SUPER, and received financial state aid managed by the Agence Nationale de la Recherche, as part of the programme Investissements d’avenir under the reference ANR-11-IDEX-0004-02. The Flatiron Institute is supported by the Simons Foundation.


  • [1] T. M. C. Abbott, F. B. Abdalla, A. Alarcon, J. Aleksić, S. Allam, S. Allen, A. Amara, J. Annis, J. Asorey, S. Avila, and et al. (2018-08) Dark Energy Survey year 1 results: Cosmological constraints from galaxy clustering and weak lensing. PRD 98 (4), pp. 043526. External Links: Document, 1708.01530 Cited by: §2.
  • [2] J. Adler and O. Öktem (2018-11) Deep Bayesian Inversion. arXiv e-prints, pp. arXiv:1811.05910. External Links: 1811.05910 Cited by: §2.
  • [3] J. Alsing, T. Charnock, S. Feeney, and B. Wand elt (2019-09)

    Fast likelihood-free cosmology with neural density estimators and active learning

    MNRAS 488 (3), pp. 4440–4458. External Links: Document, 1903.00007 Cited by: §1, §3, §4, Software used.
  • [4] J. Alsing and B. Wandelt (2018-05) Generalized massive optimal data compression. MNRAS 476 (1), pp. L60–L64. External Links: Document, 1712.00012 Cited by: §2.
  • [5] C. M. Bishop (1994) Mixture density networks. Working  Paper Aston University, Aston University (English). External Links: ISBN NCRG/94/004 Cited by: §1.
  • [6] A. Bohé, L. Shao, A. Taracchini, A. Buonanno, S. Babak, I. W. Harry, I. Hinder, S. Ossokine, M. Pürrer, V. Raymond, and et al. (2017-02) Improved effective-one-body model of spinning, nonprecessing binary black holes for the era of gravitational-wave astrophysics with advanced detectors. Physical Review D 95 (4). External Links: ISSN 2470-0029, Link, Document Cited by: §3.
  • [7] J. Brehmer, G. Louppe, J. Pavez, and K. Cranmer (2020) Mining gold from implicit models to improve likelihood-free inference. Proceedings of the National Academy of Sciences 117 (10), pp. 5242–5249. External Links: Document, ISSN 0027-8424, Link, Cited by: §1.
  • [8] J. Brehmer, S. Mishra-Sharma, J. Hermans, G. Louppe, and K. Cranmer (2019-11)

    Mining for dark matter substructure: inferring subhalo population properties from strong lenses with machine learning

    The Astrophysical Journal 886 (1), pp. 49. External Links: ISSN 1538-4357, Link, Document Cited by: §4.
  • [9] J. Caldeira, W.L.K. Wu, B. Nord, C. Avestruz, S. Trivedi, and K.T. Story (2019-07) DeepCMB: lensing reconstruction of the cosmic microwave background with deep neural networks. Astronomy and Computing 28, pp. 100307. External Links: ISSN 2213-1337, Link, Document Cited by: §4.
  • [10] T. Charnock, G. Lavaux, and B. D. Wandelt (2018-04) Automatic physical inference with information maximizing neural networks. PRD 97 (8), pp. 083004. External Links: Document, 1802.03537 Cited by: §2.
  • [11] K. Cranmer, J. Brehmer, and G. Louppe (2020) The frontier of simulation-based inference. Proceedings of the National Academy of Sciences. External Links: Document, ISSN 0027-8424, Link, Cited by: §1.
  • [12] D. Foreman-Mackey, D. W. Hogg, D. Lang, and J. Goodman (2013-03) emcee: The MCMC Hammer. PASP 125 (925), pp. 306. External Links: Document, 1202.3665 Cited by: Software used.
  • [13] S. R. Hinton (2016-08) ChainConsumer.

    The Journal of Open Source Software

    1, pp. 00045.
    External Links: Document Cited by: Software used.
  • [14] J. Jasche, F. Leclercq, and B.D. Wandelt (2015-01) Past and present cosmic structure in the sdss dr7 main sample. Journal of Cosmology and Astroparticle Physics 2015 (01), pp. 036–036. External Links: ISSN 1475-7516, Link, Document Cited by: §4.
  • [15] J. Jasche and B. D. Wandelt (2013-04) Bayesian physical reconstruction of initial conditions from large-scale structure surveys. Monthly Notices of the Royal Astronomical Society 432 (2), pp. 894–913. External Links: ISSN 1365-2966, Link, Document Cited by: §4.
  • [16] E. T. Jaynes (2003) Probability theory: the logic of science. Cambridge University Press. External Links: ISBN 0521592712, LCCN 2002071486 Cited by: §2.
  • [17] N. Jeffrey, J. Alsing, and F. Lanusse (2020) Likelihood-free inference with neural compression of des sv weak lensing map statistics. External Links: 2009.08459 Cited by: §4.
  • [18] N. Jeffrey, F. Lanusse, O. Lahav, and J. Starck (2020-03) Deep learning dark matter map reconstructions from DES SV weak lensing data. MNRAS 492 (4), pp. 5023–5029. External Links: Document, 1908.00543 Cited by: §4.
  • [19] D. Jimenez Rezende and S. Mohamed (2015-05) Variational Inference with Normalizing Flows. arXiv e-prints, pp. arXiv:1505.05770. External Links: 1505.05770 Cited by: §2.
  • [20] S. Joudaki and KiDS Collaboration (2018-03) KiDS-450 + 2dFLenS: Cosmological parameter constraints from weak gravitational lensing tomography and overlapping redshift-space galaxy clustering. MNRAS 474 (4), pp. 4894–4924. External Links: Document, 1707.06627 Cited by: §2.
  • [21] D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling (2016) Improved variational inference with inverse autoregressive flow. In Advances in neural information processing systems, pp. 4743–4751. Cited by: §2.
  • [22] S. Kullback and R. A. Leibler (1951-03) On information and sufficiency. Ann. Math. Statist. 22 (1), pp. 79–86. External Links: Document, Link Cited by: §2.
  • [23] P. Lemos, N. Jeffrey, L. Whiteway, O. Lahav, N. I. Libeskind, and Y. Hoffman (2020-10) The sum of the masses of the Milky Way and M31: a likelihood-free inference approach. arXiv e-prints, pp. arXiv:2010.08537. External Links: 2010.08537 Cited by: §4.
  • [24] A. Nitz, I. Harry, D. Brown, C. M. Biwer, J. Willis, T. D. Canton, C. Capano, L. Pekowsky, T. Dent, A. R. Williamson, G. S. Davies, S. De, M. Cabero, B. Machenschalk, P. Kumar, S. Reyes, D. Macleod, dfinstad, F. Pannarale, T. Massinger, M. Tápai, L. Singer, S. Kumar, S. Khan, S. Fairhurst, A. Nielsen, SSingh087, shasvath, B. U. V. Gadre, and I. Dorrington (2020-10) Gwastro/pycbc: pycbc release v1.16.10. Zenodo. External Links: Document, Link Cited by: Software used.
  • [25] G. Papamakarios and I. Murray (2016) Fast -free inference of simulation models with bayesian conditional density estimation. Advances in Neural Information Processing Systems, pp. 1028–1036. External Links: 1605.06376 Cited by: §1.
  • [26] G. Papamakarios, T. Pavlakou, and I. Murray (2017) Masked autoregressive flow for density estimation. Advances in Neural Information Processing Systems, pp. 2338–2347. External Links: 1705.07057 Cited by: §1, §2.
  • [27] G. Papamakarios, D. Sterratt, and I. Murray (2019-16–18 Apr) Sequential neural likelihood: fast likelihood-free inference with autoregressive flows. In Proceedings of Machine Learning Research, K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of Machine Learning Research, Vol. 89, , pp. 837–848. External Links: Link Cited by: §2.
  • [28] M. A. Petroff, G. E. Addison, C. L. Bennett, and J. L. Weiland (2020) Full-sky cosmic microwave background foreground cleaning using machine learning. External Links: 2004.11507 Cited by: §4.
  • [29] PlanckCollaboration (2020-09) Planck 2018 results. VI. Cosmological parameters. AAP 641, pp. A6. External Links: Document, 1807.06209 Cited by: §2.
  • [30] D. K. Ramanah, R. Wojtak, Z. Ansari, C. Gall, and J. Hjorth (2020) Dynamical mass inference of galaxy clusters with neural flows. External Links: 2003.05951 Cited by: §4.
  • [31] M. Shirasaki, N. Yoshida, and S. Ikeda (2019-08) Denoising weak lensing mass maps with deep learning. PRD 100 (4), pp. 043527. External Links: Document, 1812.05781 Cited by: §4.
  • [32] P. L. Taylor, T. D. Kitching, J. Alsing, B. D. Wandelt, S. M. Feeney, and J. D. McEwen (2019-07) Cosmic shear: inference from forward models. Physical Review D 100 (2). External Links: ISSN 2470-0029, Link, Document Cited by: §4.