A machine learning approach to galaxy properties: Joint redshift - stellar mass probability distributions with Random Forest

12/10/2020
∙
by   S. Mucesh, et al.
∙
7
∙

We demonstrate that highly accurate joint redshift - stellar mass PDFs can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep photometry in the griz bands, and the second reflecting the photometric scatter present in the main DES survey, with carefully constructed representative training data in each case. We validate our joint PDFs for 10,699 test galaxies by utilising the copula probability integral transform (copPIT) and the Kendall distribution function, and their univariate counterparts to validate the marginals. Benchmarked against a basic set-up of the template-fitting code BAGPIPES, our ML-based method outperforms template fitting on all of our pre-defined performance metrics. In addition to accuracy, the RF is extremely fast, able to compute joint PDFs for a million galaxies in just over 2 hours with consumer computer hardware. Such speed enables PDFs to be derived in real-time within analysis codes, solving potential storage issues. As part of this work we have developed GALPRO, a highly intuitive and efficient Python package to rapidly generate multivariate PDFs on-the-fly. GALPRO is documented and available for researchers to use in their cosmology and galaxy evolution studies at https://galpro.readthedocs.io/.

READ FULL TEXT

page 9

page 11

research
∙ 11/14/2018

Probabilistic Random Forest: A machine learning algorithm for noisy datasets

Machine learning (ML) algorithms become increasingly important in the an...
research
∙ 12/28/2016

ggRandomForests: Exploring Random Forest Survival

Random forest (Leo Breiman 2001a) (RF) is a non-parametric statistical m...
research
∙ 02/25/2022

MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

The broad adoption of Machine Learning (ML) in security-critical fields ...
research
∙ 04/16/2018

RFCDE: Random Forests for Conditional Density Estimation

Random forests is a common non-parametric regression technique which per...
research
∙ 05/11/2023

How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology

Predictions are a central part of water resources research. Historically...
research
∙ 03/27/2023

Nonparametric approaches for analyzing carbon emission: from statistical and machine learning perspectives

Linear regression models, especially the extended STIRPAT model, are rou...
research
∙ 08/22/2021

Explainable Machine Learning using Real, Synthetic and Augmented Fire Tests to Predict Fire Resistance and Spalling of RC Columns

This paper presents the development of systematic machine learning (ML) ...

Please sign up or login with your details

Forgot password? Click here to reset