Big Data vs. complex physical models: a scalable inference algorithm

07/14/2017
by   Johannes Buchner, et al.
0

The data torrent unleashed by current and upcoming instruments requires scalable analysis methods. Machine Learning approaches scale well. However, separating the instrument measurement from the physical effects of interest, dealing with variable errors, and deriving parameter uncertainties is usually an afterthought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here we present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, in our method the number of physical model evaluations scales sub-linearly with the number of data sets, and we make no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2017

Parallel Markov Chain Monte Carlo for Bayesian Hierarchical Models with Big Data, in Two Stages

Due to the escalating growth of big data sets in recent years, new paral...
research
11/07/2022

Monte Carlo Techniques for Addressing Large Errors and Missing Data in Simulation-based Inference

Upcoming astronomical surveys will observe billions of galaxies across c...
research
05/11/2015

On Markov chain Monte Carlo methods for tall data

Markov chain Monte Carlo methods are often deemed too computationally in...
research
05/22/2018

Langevin Markov Chain Monte Carlo with stochastic gradients

Monte Carlo sampling techniques have broad applications in machine learn...
research
08/30/2018

An Introduction to Inductive Statistical Inference -- from Parameter Estimation to Decision-Making

These lecture notes aim at a post-Bachelor audience with a backgound at ...
research
03/19/2021

Scalable computation for Bayesian hierarchical models

The article is about algorithms for learning Bayesian hierarchical model...
research
11/07/2022

Nested sampling statistical errors

Nested sampling (NS) is a popular algorithm for Bayesian computation. We...

Please sign up or login with your details

Forgot password? Click here to reset