Modeling Generalization in Machine Learning: A Methodological and Computational Study

06/28/2020
by   Pietro Barbiero, et al.
0

As machine learning becomes more and more available to the general public, theoretical questions are turning into pressing practical issues. Possibly, one of the most relevant concerns is the assessment of our confidence in trusting machine learning predictions. In many real-world cases, it is of utmost importance to estimate the capabilities of a machine learning algorithm to generalize, i.e., to provide accurate predictions on unseen data, depending on the characteristics of the target problem. In this work, we perform a meta-analysis of 109 publicly-available classification data sets, modeling machine learning generalization as a function of a variety of data set characteristics, ranging from number of samples to intrinsic dimensionality, from class-wise feature skewness to F1 evaluated on test samples falling outside the convex hull of the training set. Experimental results demonstrate the relevance of using the concept of the convex hull of the training data in assessing machine learning generalization, by emphasizing the difference between interpolated and extrapolated predictions. Besides several predictable correlations, we observe unexpectedly weak associations between the generalization ability of machine learning models and all metrics related to dimensionality, thus challenging the common assumption that the curse of dimensionality might impair generalization in machine learning.

READ FULL TEXT

page 2

page 11

page 14

page 19

page 20

page 21

research
03/23/2023

Generalization with quantum geometry for learning unitaries

Generalization is the ability of quantum machine learning models to make...
research
06/23/2021

False perfection in machine prediction: Detecting and assessing circularity problems in machine learning

Machine learning algorithms train models from patterns of input data and...
research
03/31/2020

Prediction Confidence from Neighbors

The inability of Machine Learning (ML) models to successfully extrapolat...
research
06/17/2022

Representational Multiplicity Should Be Exposed, Not Eliminated

It is prevalent and well-observed, but poorly understood, that two machi...
research
12/05/2012

Making Early Predictions of the Accuracy of Machine Learning Applications

The accuracy of machine learning systems is a widely studied research to...
research
07/18/2022

Interpolation, extrapolation, and local generalization in common neural networks

There has been a long history of works showing that neural networks have...
research
12/09/2021

Effective dimension of machine learning models

Making statements about the performance of trained models on tasks invol...

Please sign up or login with your details

Forgot password? Click here to reset