Scalable Statistical Inference of Photometric Redshift via Data Subsampling

03/30/2021
by   Arindam Fadikar, et al.
0

Handling big data has largely been a major bottleneck in traditional statistical models. Consequently, when accurate point prediction is the primary target, machine learning models are often preferred over their statistical counterparts for bigger problems. But full probabilistic statistical models often outperform other models in quantifying uncertainties associated with model predictions. We develop a data-driven statistical modeling framework that combines the uncertainties from an ensemble of statistical models learned on smaller subsets of data carefully chosen to account for imbalances in the input space. We demonstrate this method on a photometric redshift estimation problem in cosmology, which seeks to infer a distribution of the redshift – the stretching effect in observing the light of far-away galaxies – given multivariate color information observed for an object in the sky. Our proposed method performs balanced partitioning, graph-based data subsampling across the partitions, and training of an ensemble of Gaussian process models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2019

Adaptive Ensemble Learning of Spatiotemporal Processes with Calibrated Predictive Uncertainty: A Bayesian Nonparametric Approach

Ensemble learning is a mainstay in modern data science practice. Convent...
research
07/07/2017

InferSpark: Statistical Inference at Scale

The Apache Spark stack has enabled fast large-scale data processing. Des...
research
09/09/2015

Statistical Inference, Learning and Models in Big Data

The need for new methods to deal with big data is a common theme in most...
research
07/03/2023

Systematic Bias in Sample Inference and its Effect on Machine Learning

A commonly observed pattern in machine learning models is an underpredic...
research
06/16/2020

A Goodness-of-Fit Test for Statistical Models

Statistical modeling plays a fundamental role in understanding the under...
research
04/25/2018

Disentangling and Assessing Uncertainties in Multiperiod Corporate Default Risk Predictions

Measuring the corporate default risk is broadly important in economics a...

Please sign up or login with your details

Forgot password? Click here to reset