New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences

09/30/2022
by   Umberto Michelucci, et al.
16

The application of machine learning to physics problems is widely found in the scientific literature. Both regression and classification problems are addressed by a large array of techniques that involve learning algorithms. Unfortunately, the measurement errors of the data used to train machine learning models are almost always neglected. This leads to estimations of the performance of the models (and thus their generalisation power) that is too optimistic since it is always assumed that the target variables (what one wants to predict) are correct. In physics, this is a dramatic deficiency as it can lead to the belief that theories or patterns exist where, in reality, they do not. This paper addresses this deficiency by deriving formulas for commonly used metrics (both for regression and classification problems) that take into account measurement errors of target variables. The new formulas give an estimation of the metrics which is always more pessimistic than what is obtained with the classical ones, not taking into account measurement errors. The formulas given here are of general validity, completely model-independent, and can be applied without limitations. Thus, with statistical confidence, one can analyze the existence of relationships when dealing with measurements with errors of any kind. The formulas have wide applicability outside physics and can be used in all problems where measurement errors are relevant to the conclusions of studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2017

Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

As the bioinformatics field grows, it must keep pace not only with new d...
research
06/26/2020

Prediction in polynomial errors-in-variables models

A multivariate errors-in-variables (EIV) model with an intercept term, a...
research
10/27/2020

Scientific intuition inspired by machine learning generated hypotheses

Machine learning with application to questions in the physical sciences ...
research
03/04/2019

Reconstruction of Hydraulic Data by Machine Learning

Numerical simulation models associated with hydraulic engineering take a...
research
02/17/2021

Geostatistical Learning: Challenges and Opportunities

Statistical learning theory provides the foundation to applied machine l...

Please sign up or login with your details

Forgot password? Click here to reset