Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference

by   Niccolò Dalmasso, et al.

It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require the full uncertainty landscape of the parameters of interest given observed data. However, most machine learning (ML) based methods with open-source software target point prediction or classification, and hence fall short in quantifying uncertainty in complex regression and parameter inference settings such as the applications mentioned above. As an alternative to methods that focus on predicting the response (or parameters) y from features x, we provide nonparametric conditional density estimation (CDE) tools for approximating and validating the entire probability density p(y|x) given training data for x and y. As there is no one-size-fits-all CDE method, the goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings and which in addition can easily be fit to the problem at hand. Specifically, we introduce CDE software packages in Python and R based on four ML prediction methods adapted and optimized for CDE: NNKCDE, RFCDE, FlexCode, and DeepCDE. Furthermore, we present the cdetools package, which includes functions for computing a CDE loss function for model selection and tuning of parameters, together with diagnostics functions. We provide sample code in Python and R as well as examples of applications to photometric redshift estimation and likelihood-free cosmology via CDE.


Neural Density Estimation and Likelihood-free Inference

I consider two problems in machine learning and statistics: the problem ...

(f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data

Random forests is a common non-parametric regression technique which per...

RFCDE: Random Forests for Conditional Density Estimation

Random forests is a common non-parametric regression technique which per...

Density estimation on small datasets

How might a smooth probability distribution be estimated, with accuratel...

General Bayesian Loss Function Selection and the use of Improper Models

Statisticians often face the choice between using probability models or ...

TraDE: Transformers for Density Estimation

We present TraDE, an attention-based architecture for auto-regressive de...

Analyzing the Fine Structure of Distributions

One aim of data mining is the identification of interesting structures i...

Please sign up or login with your details

Forgot password? Click here to reset