Demonstration of MomentNetworks for high-dimensional probability density estimation (LFI)
High-dimensional probability density estimation for inference suffers from the "curse of dimensionality". For many physical inference problems, the full posterior distribution is unwieldy and seldom used in practice. Instead, we propose direct estimation of lower-dimensional marginal distributions, bypassing high-dimensional density estimation or high-dimensional Markov chain Monte Carlo (MCMC) sampling. By evaluating the two-dimensional marginal posteriors we can unveil the full-dimensional parameter covariance structure. We additionally propose constructing a simple hierarchy of fast neural regression models, called Moment Networks, that compute increasing moments of any desired lower-dimensional marginal posterior density; these reproduce exact results from analytic posteriors and those obtained from Masked Autoregressive Flows. We demonstrate marginal posterior density estimation using high-dimensional LIGO-like gravitational wave time series and describe applications for problems of fundamental cosmology.READ FULL TEXT VIEW PDF
Demonstration of MomentNetworks for high-dimensional probability density estimation (LFI)
Estimating the posterior probability densityof a set of parameters given some observed data is often the primary objective of problems of inference, prediction, or generation. The object encapsulates all belief and uncertainties about the unknown quantities . With this aim in mind, recent advances in neural density estimation have improved our ability to estimate the density from a set of training examples .
Estimating such probability densities with neural density methods, such as Mixture Density Networks Bishop (1994), or recent state-of-the-art normalizing flow methods, such as Masked Autogregressive Flows (MAF Papamakarios et al. (2017)), provide an excellent way to quantify uncertainty for predicted or inferred parameters and signals . Used for likelihood-free inference (also known as simulation-based inference Brehmer et al. (2020); Cranmer et al. (2020)) these density estimation methods can estimate conditional probability densities for parameters and data, either the posterior or the likelihood Papamakarios and Murray (2016); Alsing et al. (2019).
For high-dimensional signals, estimation of the full joint density is often not useful and, instead, summaries of lower-dimensional marginal densities are the final goal. For example, the marginal posterior density per pixel, or subsets of pixels, could serve to quantify uncertainty in a reconstructed image.
In this example, the joint posterior marginal for pairs of pixel parameters given some observed data
would marginalize over all possible values of all other parameters (i.e. the other pixels and latent parameters) . If this were evaluated for all pairs of parameters, all 2D marginal moments of the high-dimensional posterior distribution would be characterized.
In this contribution we present two complementary approaches to evaluate the two-dimensional marginal posterior distributions, marginal flows and Moment Networks (Sec. 2). In Sec. 3 we demonstrate the two methods in comparison to a known underlying posterior density (sampled with MCMC), and show a simulated gravitational wave data model, where the underlying time-ordered signal values form the high-dimensional parameter space to be inferred. In Sec. 4 we describe seemingly intractable problems in cosmological inference that can be solved using marginal posterior density estimation.
In practice, for many physical inference problems with high-dimensional parameter spaces, the full high-dimensional posterior distribution is not necessary or even interpretable. Instead, the inference goal are the marginal one- and two-dimensional posterior distributions of the parameters (e.g. PlanckCollaboration, 2020; Joudaki and KiDS Collaboration, 2018; Abbott et al., 2018).
Even if full posterior sampling is possible through sophisticated MCMC techniques, e.g. in hierarchical models with known distributions (rather than in a likelihood-free framework), the number of posterior samples needed to compute a -dimensional marginal grows exponentially with . For high-dimensional problems, the limited number of independent samples that result in practice only allow for the computation of low-dimensional marginals of the posterior density.
We therefore make the marginal densities the target of our inference problem. We take inspiration from the simplicity of integration when a Monte Carlo sampled representation of the posterior distribution is available. In this case, marginalization is trivial: it amounts simply to ignoring the parameter dimensions to be marginalized over. In this work we show that we can directly bring this powerful notion to simulation-based inference; it allows us to estimate the marginal posterior density (or its moments) directly. This is a powerful approach whenever we are dealing with a large number parameters and effectively removes the practical limitation of simulation-based inference techniques to applications with a low-dimensional parameter space.
Many popular and powerful density estimation methods can be categorized as normalizing flows. These use a series of bijective functions to transform from simple known densities (e.g. unit normal) to the target density Jimenez Rezende and Mohamed (2015); Kingma et al. (2016). MAFs represent the estimated density as a transformation of a unit normal through a series of autoregressive functions Papamakarios et al. (2017, 2019). The networks are trained to give an estimate of the target distribution
by minimizing a Monte Carlo estimate of the Kullback-Leibler divergenceKullback and Leibler (1951). For a sampling distribution this would be
with varying network parameters over the forward-modelled mock data . In this same likelihood-free framework, one can directly estimate the posterior distributions for subsets of the large parameter set. For any two parameters and of the full , one can directly estimate by minimizing
The resulting density will indeed be an estimate of the marginal posterior for the chosen parameter pair (eq. 1) if all parameters of (not just the chosen pair) are drawn from the prior to generate the training data. This procedure also avoids the need for data compression steps Charnock et al. (2018); Alsing and Wandelt (2018)
, as we condition on high-dimensional datarather than estimating its density, and any nuisance parameters are automatically marginalized away, provided they have been sufficiently sampled in the training data.
In practice, posterior estimates often serve principally to compute posterior moments. Moment Networks allow us to side-step the problem of estimating the posterior density and directly skip to estimation of location, scale, and covariance of the parameters (and possibly higher-order moments). When this is sufficient, Moment Networks allow the use of far simpler neural network architectures, which reduces risk of training failure, and boosts inference speed.
We begin by noting that if we find some function of our data that minimizes an loss over the distribution of possible training examples ,
then , which we represent as a neural network, evaluated for the observed data is the mean of the posterior distribution . It is therefore possible to create a hierarchy of networks to generate further moments of the posterior distribution. For example, the function that minimizes
for fixed, already trained , is such that
, is the set of posterior variancesJaynes (2003); Adler and Öktem (2018). The objective functions for marginal posterior parameter covariances can be similarly constructed.
By sampling the full parameter space from the prior distributions , the functions and can be combined to output the posterior means, variances, and covariances for subsets of the full set of parameters; the marginalization over other parameters is implicitly done during training. This result is exact and independent of the true underlying posterior or prior distributions.
The Moment Network solves for the marginal posterior moments by construction, and therefore does not suffer the problem of mode collapse in variational inference with multi-modal posteriors, which can lead to underpredicted uncertainty. Outside the likelihood-free framework, if one does have information about the functional form of the posterior, one can fit the posterior parameters to the marginal moments (see Sec. 4).
: We use a 100-dimensional parameter inference toy model to demonstrate marginal posterior estimation for pairs of parameters. The model consists of 100-element data vectors with non-stationary Gaussian noise and a Gaussian prior distribution with non-trivial covariance introducing parameter correlation. We estimate the marginal posterior density for parameter pairs using MAF (using thepyDELFI package Alsing et al. (2019)) and estimate the 2D marginal mean, variance and covariance using Moment Networks. The results are represented in Fig. 1.
As a reference, we can directly sample the posterior distribution using high-dimensional MCMC, which took draws from the likelihood. The normalizing flow result would have been intractable if the density estimation target was with respect to the full parameter space or to the data space. With a marginal flow, changing the target to the pairs of parameters, it is simple (with a basic 2-GPU:12GB set-up) to evaluate all marginal posterior pairs (often represented as a so-called “corner plot”).
With the same set-up, the Moment Network hierarchy was able to accurately evaluate the means, variances and covariances of the marginals (see Fig. 1) with a few seconds of training and evaluation in
s without requiring any sampling or grid evaluation. For many practical applications of inference in the physical sciences, these marginal joint moments would be the final goal.
Gravitational wave signal demonstration: Fig. 2 shows two example simulated gravitational wave time series. The two signals (dashed orange) are s intervals from the s before a binary black hole merger using the SEOBNRv4 model Bohé et al. (2017).
We have simplified the problem for this demonstration by removing all “geometric” effects (black hole spin, inclination, detector geometry) and use only the polarization for the detector strain . We do, however, sample the merger events with an independent prior distribution for each mass and distance . The noise is LIGO-like noise, which, along with the signal, was generated using the pyCBC package for 35000 simulations.
With the time series elements forming a high-dimensional parameter space, the left panel of Fig. 3
shows a representation of the marginal posterior standard deviation for each of 128 parameters for a simulated data set. This result was evaluated using the trained Moment Network. Theright panel shows a validation case of similar complexity to the gravitational wave model, but with known likelihood. The Moment Network trained on simulations matches accurately with a long-run MCMC chain. This validates our approach.
Though the direct density estimation of marginal posteriors is much more robust than the estimation of the full posterior, it may still suffer from well-known issues of density estimation. Moment Networks optimize a completely different set of objective functions to return estimates of the posterior moments. This affords an opportunity to cross-validate, as moments of the estimated marginal posteriors should match those from the Moment Network. If the results are inconsistent for an initial set of simulations, then there may be insufficient network complexity (the network complexity for both methods scales similarly) or insufficient number of simulations.
Thus far, density estimation likelihood-free inference in cosmology has generally been limited to a few parameters (e.g. Alsing et al., 2019; Taylor et al., 2019; Brehmer et al., 2019; Jeffrey et al., 2020; Ramanah et al., 2020; Lemos et al., 2020). Though simulation-based inference of cosmological fields (including dark matter) can be integral to many analyses (e.g. Caldeira et al., 2019; Shirasaki et al., 2019; Jeffrey et al., 2020; Petroff et al., 2020), it can be intractable to estimate the full posterior due to high-dimensionality. With the approach we propose, joint marginal posteriors (and associated moments) for reconstructed cosmological fields can be directly evaluated.
For cases where it is possible to sample, marginal flows and Moment Networks still provide advantages. One particularly ambitious cosmological sampler BORG Jasche and Wandelt (2013); Jasche et al. (2015) samples the dimensional posterior density of the initial conditions of the Universe for given galaxy data, using a non-linear forward model including the physics and data effects. Though the full posterior is sampled, the complexity of the sampler and the inherently sequential nature of MCMC limits the number of independent samples to ; sufficient only to estimate low-dimensional marginal posteriors and their moments. The approach proposed in this work could use similar computational resources to generate simulations to train marginal flows and Moment Networks in parallel (rather than sequentially) and efficiently output low-dimensional marginal posteriors and moments.
Beyond general high-dimensional inference, the principal motivation for this work (Sec. 2), we plan to explore a wide range of further applications of marginal flows and Moment Networks to probe the fundamental physics of the Universe in future studies.
Demonstration code can be found at: github.com/NiallJeffrey/MomentNetworks
This work provides a robust approach to quantify uncertainty from high-dimensional parameter spaces by estimating marginal posterior distributions or their associated moments directly. This has immediate application for parameter and model inference in astrophysics and cosmology, and the physical sciences more generally. We note that if the method is misunderstood or misapplied, incorrect uncertainty quantification or risk analysis would follow. To mitigate this, diagnostic and validation methods can be applied (e.g. ensembles of neural density estimators or quantile tests) or, as proposed in this work, by comparing results between likelihood-free methods (e.g. marginal flows and Moment Networks).
To mitigate this, diagnostic and validation methods can be applied (e.g. ensembles of neural density estimators or quantile tests) or, as proposed in this work, by comparing results between likelihood-free methods (e.g. marginal flows and Moment Networks).The approach in this work can be applied to signal inference and prediction more generally (including fast image analysis, time series prediction, forecasting, and quantifying uncertainty for decision making).
pyDELFI (github.com/justinalsing/pydelfi) for MAF density estimation implementation Alsing et al. (2019); chainconsumer (samreay.github.io) for Fig. 1 Hinton (2016); emcee (emcee.readthedocs.io) for MCMC sampling Foreman-Mackey et al. (2013); pyCBC (pycbc.org) for gravitational wave data simulation Nitz et al. (2020).
The authors thank Tom Charnock for useful discussions. NJ acknowledges funding from the École Normale Supérieure (ENS). BDW acknowledges support by the ANR BIG4 project, grant ANR-16-CE23-0002 of the French Agence Nationale de la Recherche; and the Labex ILP (reference ANR-10-LABX-63) part of the Idex SUPER, and received financial state aid managed by the Agence Nationale de la Recherche, as part of the programme Investissements d’avenir under the reference ANR-11-IDEX-0004-02. The Flatiron Institute is supported by the Simons Foundation.
Fast likelihood-free cosmology with neural density estimators and active learning. MNRAS 488 (3), pp. 4440–4458. External Links: Cited by: §1, §3, §4, Software used.
Mining for dark matter substructure: inferring subhalo population properties from strong lenses with machine learning. The Astrophysical Journal 886 (1), pp. 49. External Links: Cited by: §4.
The Journal of Open Source Software1, pp. 00045. External Links: Cited by: Software used.