MiraBest: A Dataset of Morphologically Classified Radio Galaxies for Machine Learning

05/18/2023
by   Fiona A. M. Porter, et al.
0

The volume of data from current and future observatories has motivated the increased development and application of automated machine learning methodologies for astronomy. However, less attention has been given to the production of standardised datasets for assessing the performance of different machine learning algorithms within astronomy and astrophysics. Here we describe in detail the MiraBest dataset, a publicly available batched dataset of 1256 radio-loud AGN from NVSS and FIRST, filtered to 0.03 < z < 0.1, manually labelled by Miraghaei and Best (2017) according to the Fanaroff-Riley morphological classification, created for machine learning applications and compatible for use with standard deep learning libraries. We outline the principles underlying the construction of the dataset, the sample selection and pre-processing methodology, dataset structure and composition, as well as a comparison of MiraBest to other datasets used in the literature. Existing applications that utilise the MiraBest dataset are reviewed, and an extended dataset of 2100 sources is created by cross-matching MiraBest with other catalogues of radio-loud AGN that have been used more widely in the literature for machine learning applications.

READ FULL TEXT

page 7

page 8

research
03/21/2019

Penobscot Dataset: Fostering Machine Learning Development for Seismic Interpretation

We have seen in the past years the flourishing of machine and deep learn...
research
03/26/2019

Netherlands Dataset: A New Public Dataset for Machine Learning in Seismic Interpretation

Machine learning and, more specifically, deep learning algorithms have s...
research
05/22/2017

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

Despite incredible recent advances in machine learning, building machine...
research
03/03/2020

Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification

In order to provide benchmark performance for Urdu text document classif...
research
01/21/2022

ERS: a novel comprehensive endoscopy image dataset for machine learning, compliant with the MST 3.0 specification

The article presents a new multi-label comprehensive image dataset from ...
research
07/14/2020

Bringing the People Back In: Contesting Benchmark Machine Learning Datasets

In response to algorithmic unfairness embedded in sociotechnical systems...
research
06/07/2019

Radio Galaxy Zoo: Unsupervised Clustering of Convolutionally Auto-encoded Radio-astronomical Images

This paper demonstrates a novel and efficient unsupervised clustering me...

Please sign up or login with your details

Forgot password? Click here to reset