Robust variable selection in the framework of classification with label noise and outliers: applications to spectroscopic data in agri-food

10/20/2020
by   Andrea Cappozzo, et al.
0

Classification of high-dimensional spectroscopic data is a common task in analytical chemistry. Well-established procedures like support vector machines (SVMs) and partial least squares discriminant analysis (PLS-DA) are the most common methods for tackling this supervised learning problem. Nonetheless, interpretation of these models remains sometimes difficult, and solutions based on wavelength selection are often preferred as they lead to clearer chemometrics interpretation. Unfortunately, for some delicate applications like food authenticity, mislabeled and adulterated spectra occur both in the calibration and/or validation sets, with dramatic effects on the model development, its prediction accuracy and robustness. Motivated by these issues, we propose to employ a robust model-based method for jointly performing variable selection and label noise detection. We demonstrate the effectiveness of our proposal in dealing with three agri-food spectroscopic studies, where several forms of perturbations are considered. Our approach succeeds in diminishing problem complexity, identifying anomalous spectra and attaining competitive predictive accuracy considering a very low number of selected wavelengths.

READ FULL TEXT
research
07/29/2020

Robust variable selection for model-based learning in presence of adulteration

The problem of identifying the most discriminating features when perform...
research
05/17/2019

Comparison of Machine Learning Models in Food Authentication Studies

The underlying objective of food authentication studies is to determine ...
research
06/07/2018

Feature selection in functional data classification with recursive maxima hunting

Dimensionality reduction is one of the key issues in the design of effec...
research
12/23/2017

Combining Weakly and Webly Supervised Learning for Classifying Food Images

Food classification from images is a fine-grained classification problem...
research
03/27/2022

Interpretable Machine Learning Models for Modal Split Prediction in Transportation Systems

Modal split prediction in transportation networks has the potential to s...

Please sign up or login with your details

Forgot password? Click here to reset