Incorporating Measurement Error in Astronomical Object Classification

12/13/2021
by   Sarah Shy, et al.
0

Most general-purpose classification methods, such as support-vector machine (SVM) and random forest (RF), fail to account for an unusual characteristic of astronomical data: known measurement error uncertainties. In astronomical data, this information is often given in the data but discarded because popular machine learning classifiers cannot incorporate it. We propose a simulation-based approach that incorporates heteroscedastic measurement error into any existing classification method to better quantify uncertainty in classification. The proposed method first simulates perturbed realizations of the data from a Bayesian posterior predictive distribution of a Gaussian measurement error model. Then, a chosen classifier is fit to each simulation. The variation across the simulations naturally reflects the uncertainty propagated from the measurement errors in both labeled and unlabeled data sets. We demonstrate the use of this approach via two numerical studies. The first is a thorough simulation study applying the proposed procedure to SVM and RF, which are well-known hard and soft classifiers, respectively. The second study is a realistic classification problem of identifying high-z (2.9 ≤ z ≤ 5.1) quasar candidates from photometric data. The data were obtained from merged catalogs of the Sloan Digital Sky Survey, the Spitzer IRAC Equatorial Survey, and the Spitzer-HETDEX Exploratory Large-Area Survey. The proposed approach reveals that out of 11,847 high-z quasar candidates identified by a random forest without incorporating measurement error, 3,146 are potential misclassifications. Additionally, out of ∼1.85 million objects not identified as high-z quasars without measurement error, 936 can be considered candidates when measurement error is taken into account.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2018

Probabilistic Random Forest: A machine learning algorithm for noisy datasets

Machine learning (ML) algorithms become increasingly important in the an...
research
04/01/2013

An improved quasar detection method in EROS-2 and MACHO LMC datasets

We present a new classification method for quasar identification in the ...
research
04/11/2022

Zero-phase angle asteroid taxonomy classification using unsupervised machine learning algorithms

We are in an era of large catalogs and, thus, statistical analysis tools...
research
12/07/2018

Catalog of quasars from the Kilo-Degree Survey Data Release 3

We present a catalog of quasars selected from broad-band photometric ugr...
research
02/24/2019

On the Use of Emojis to Train Emotion Classifiers

Nowadays, the automatic detection of emotions is employed by many applic...
research
03/30/2021

Single Test Image-Based Automated Machine Learning System for Distinguishing between Trait and Diseased Blood Samples

We introduce a machine learning-based method for fully automated diagnos...
research
07/06/2020

A Novel Random Forest Dissimilarity Measure for Multi-View Learning

Multi-view learning is a learning task in which data is described by sev...

Please sign up or login with your details

Forgot password? Click here to reset