Monte Carlo Techniques for Addressing Large Errors and Missing Data in Simulation-based Inference

11/07/2022
by   Bingjie Wang, et al.
0

Upcoming astronomical surveys will observe billions of galaxies across cosmic time, providing a unique opportunity to map the many pathways of galaxy assembly to an incredibly high resolution. However, the huge amount of data also poses an immediate computational challenge: current tools for inferring parameters from the light of galaxies take ≳ 10 hours per fit. This is prohibitively expensive. Simulation-based Inference (SBI) is a promising solution. However, it requires simulated data with identical characteristics to the observed data, whereas real astronomical surveys are often highly heterogeneous, with missing observations and variable uncertainties determined by sky and telescope conditions. Here we present a Monte Carlo technique for treating out-of-distribution measurement errors and missing data using standard SBI tools. We show that out-of-distribution measurement errors can be approximated by using standard SBI evaluations, and that missing data can be marginalized over using SBI evaluations over nearby data realizations in the training set. While these techniques slow the inference process from ∼ 1 sec to ∼ 1.5 min per object, this is still significantly faster than standard approaches while also dramatically expanding the applicability of SBI. This expanded regime has broad implications for future applications to astronomical surveys.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2017

Big Data vs. complex physical models: a scalable inference algorithm

The data torrent unleashed by current and upcoming instruments requires ...
research
03/08/2023

Estimation of Long-Range Dependent Models with Missing Data: to Input or not to Input?

Among the most important models for long-range dependent time series is ...
research
12/06/2018

MIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We consider the problem of handling missing data with deep latent variab...
research
02/09/2022

Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo

Variational Autoencoders (VAEs) have recently been highly successful at ...
research
02/28/2022

On Testability and Goodness of Fit Tests in Missing Data Models

Significant progress has been made in developing identification and esti...
research
04/24/2019

Nonparametric Pattern-Mixture Models for Inference with Missing Data

Pattern-mixture models provide a transparent approach for handling missi...
research
09/06/2022

Understanding and Reducing Crater Counting Errors in Citizen Science Data and the Need for Standardisation

Citizen science has become a popular tool for preliminary data processin...

Please sign up or login with your details

Forgot password? Click here to reset