A universal synthetic dataset for machine learning on spectroscopic data

06/13/2022
by   Jan Schuetzke, et al.
0

To assist in the development of machine learning methods for automated classification of spectroscopic data, we have generated a universal synthetic dataset that can be used for model validation. This dataset contains artificial spectra designed to represent experimental measurements from techniques including X-ray diffraction, nuclear magnetic resonance, and Raman spectroscopy. The dataset generation process features customizable parameters, such as scan length and peak count, which can be adjusted to fit the problem at hand. As an initial benchmark, we simulated a dataset containing 35,000 spectra based on 500 unique classes. To automate the classification of this data, eight different machine learning architectures were evaluated. From the results, we shed light on which factors are most critical to achieve optimal performance for the classification task. The scripts used to generate synthetic spectra, as well as our benchmark dataset and evaluation routines, are made publicly available to aid in the development of improved machine learning models for spectroscopic analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2021

AGAR a microbial colony dataset for deep learning detection

The Annotated Germs for Automated Recognition (AGAR) dataset is an image...
research
04/09/2019

Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning

Nuclear magnetic resonance (NMR) spectroscopy serves as an indispensable...
research
01/24/2018

Machine learning in APOGEE: Unsupervised spectral classification with K-means

The data volume generated by astronomical surveys is growing rapidly. Tr...
research
03/18/2019

Galaxy classification: A machine learning analysis of GAMA catalogue data

We present a machine learning analysis of five labelled galaxy catalogue...
research
02/14/2023

Parameters for > 300 million Gaia stars: Bayesian inference vs. machine learning

The Gaia Data Release 3 (DR3), published in June 2022, delivers a divers...
research
03/22/2019

Artificial intelligence-based process for metal scrap sorting

Machine learning offers remarkable benefits for improving workplaces and...
research
09/22/2021

The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence

We present the Cosmology and Astrophysics with MachinE Learning Simulati...

Please sign up or login with your details

Forgot password? Click here to reset